I.song - Probability and Random Variables
I.song - Probability and Random Variables
So Ryoung Park
Seokho Yoon
Probability
and Random
Variables: Theory
and Applications
Probability and Random Variables: Theory
and Applications
Iickho Song · So Ryoung Park · Seokho Yoon
Seokho Yoon
College of Information and Communication
Engineering
Sungkyunkwan University
Suwon, Korea (Republic of)
Translation from the Korean language edition: “Theory of Random Variables” by Iickho Song ©
Saengneung 2020. Published by Saengneung. All Rights Reserved.
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Switzerland AG 2022
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To
our kin and academic ancestors and families
and
to
all those who appreciate and enjoy
the beauty of thinking and learning
To
Professors
Souguil J. M. Ann,
Myung Soo Cha,
Saleem A. Kassam, and
Jordan M. Stoyanov
for their invisible yet enlightening guidance
Preface
This book is a translated version, with some revisions, from Theory of Random
Variables, originally written in Korean by the first author in the year 2020. This book
is intended primarily for those who try to advance one step further beyond the basic
level of knowledge and experience on probability and random variables. At the same
time, this book would also be a good resource for experienced scholars to review and
refine familiar concepts. For these purposes, the authors have included definitions
of basic concepts in clear terms, key advanced concepts in mathematics, and diverse
concepts and notions of probability and random variables with a significant number
of examples and exercise problems.
The organization of this book is as follows: Chap. 1 describes the theory of sets and
functions. The unit step function and impulse function, to be used frequently in the
following chapters, are also discussed in detail, and the gamma function and binomial
coefficients in the complex domain are introduced. In Chap. 2, the concept of sigma
algebra is discussed, which is the key for defining probability logically. The notions
of probability and conditional probability are then discussed, and several classes of
widely used discrete and continuous probability spaces are introduced. In addition,
important notions of probability mass function and probability density function are
described. After discussing another important notion of cumulative distribution func-
tion, Chap. 3 is devoted to the discussion on the notions of random variables and
moments, and also for the discussion on the transformations of random variables.
In Chap. 4, the concept of random variables is generalized into random vectors,
also referred to as joint random variables. Transformations of random vectors are
discussed in detail. The discussion on the applications of the unit step function and
impulse function in random vectors in this chapter is a unique trait of this book.
Chapter 5 focuses on the discussion of normal random variables and normal random
vectors. The explicit formula of joint moments of normal random vectors, another
uniqueness of this book, is delineated in detail. Three statistics from normal samples
and three classes of impulsive distributions are also described in this chapter. In
Chap. 6, the authors briefly describe the fundamental aspects of the convergence of
random variables. The central limit theorem, one of the most powerful and useful
results with practical applications, is among the key expositions in this chapter.
vii
viii Preface
The uniqueness of this book includes, but is not limited to, interesting applications
of impulse functions to random vectors, exposition of the general formula for the
product moments of normal random vectors, discussion on gamma functions and
binomial coefficients in the complex space, detailed procedures to the final answers
for almost all results presented, and a substantially useful and extensive index for
finding subjects more easily. A total of more than 320 exercise problems are included,
of which a complete solution manual for all the problems is available from the authors
through the publisher.
The authors feel sincerely thankful that, as is needed for the publication of any
book, the publication of this book became a reality thanks to a huge amount of
help from many people to the authors in a variety of ways. Unfortunately, the
authors could mention only some of them explicitly: to the anonymous reviewers
for constructive and helpful comments and suggestions, to Bok-Lak Choi and
Seung-Ki Kim at Saengneung for allowing the use of the original Korean title,
to Eva Hiarapi and Yogesh Padmanaban at Springer Nature for extensive editorial
assistance, and to Amelia Youngwha Song Pegram and Yeonwha Song Wratil for
improving the readability. In addition, the research grant 2018R1A2A1A05023192
from Korea Research Foundation was an essential support in successfully completing
the preparation of this book.
The authors would feel rewarded if everyone who spends time and effort wisely in
reading and understanding the contents of this book enjoys the pleasure of learning
and advancing one step further.
Thank you!
1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Set Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.3 Laws of Set Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.1.4 Uncountable Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.2.1 One-to-One Correspondence . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.2.2 Metric Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.3 Continuity of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.3.1 Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.3.2 Discontinuities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.3.3 Absolutely Continuous Functions and Singular
Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.4 Step, Impulse, and Gamma Functions . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.4.1 Step Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.4.2 Impulse Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.4.3 Gamma Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
1.5 Limits of Sequences of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
1.5.1 Upper and Lower Limits of Sequences . . . . . . . . . . . . . . . . . . 55
1.5.2 Limit of Monotone Sequence of Sets . . . . . . . . . . . . . . . . . . . . 57
1.5.3 Limit of General Sequence of Sets . . . . . . . . . . . . . . . . . . . . . . 59
Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
2 Fundamentals of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
2.1 Algebra and Sigma Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
2.1.1 Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
2.1.2 Sigma Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
2.2 Probability Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
ix
x Contents
Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
6 Convergence of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
6.1 Types of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
6.1.1 Almost Sure Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
6.1.2 Convergence in the Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
6.1.3 Convergence in Probability and Convergence
in Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420
6.1.4 Relations Among Various Types of Convergence . . . . . . . . . 422
6.2 Laws of Large Numbers and Central Limit Theorem . . . . . . . . . . . . . 425
6.2.1 Sum of Random Variables and Its Distribution . . . . . . . . . . . 426
6.2.2 Laws of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
6.2.3 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460
Sets and functions are key concepts that play an important role in understanding
probability and random variables. In this chapter, we discuss those concepts that will
be used in later chapters.
In this section, we introduce and review some concepts and key results in the theory
of sets (Halmos 1950; Kharazishvili 2004; Shiryaev 1996; Sommerville 1958).
1.1.1 Sets
Definition 1.1.1 (abstract space) The collection of all entities is called an abstract
space, a space, or a universal set.
Definition 1.1.2 (element) The smallest unit that comprises an abstract space is
called an element, a point, or a component.
‘tall people’ is not a set because ‘tall’ is not mathematically clear. Yet, in fuzzy set
theory, such a vague collection is also regarded as a set by adopting the concept of
membership function.
Abstract spaces and sets are often represented with braces { } with all elements
explicitly shown, e.g. {1, 2, 3}; with the property of the elements described, e.g.,
{ω : 10 < ω < 20π}; {ai }; or {ai }i=1
n
.
Example 1.1.1 The result of signal processing in binary digital communication can
be represented by the abstract space Ω = {0, 1}. The collection {A, B, . . . , Z } of
capital letters of the English alphabet and the collection S = {(0, 0, . . . , 0), (0, 0,
. . . , 1), . . . , (1, 1, . . . , 1)} of binary vectors are also abstract spaces. ♦
Example 1.1.2 In the abstract space Ω = {0, 1}, 0 and 1 are elements. The abstract
space of seven-dimensional binary vectors contains 27 = 128 elements. ♦
Example 1.1.3 The set A = {1, 2, 3, 4} can also be depicted as, for example, A =
{ω : ω is a natural number smaller than 5}. ♦
Definition 1.1.4 (point set) A set with a single point is called a point set or a singleton
set.
Example 1.1.4 The sets {0}, {1}, and {2} are point sets. ♦
Consider an abstract space Ω and a set G of elements from Ω. When the element
ω does and does not belong to G, it is denoted by
ω ∈ G (1.1.1)
Example 1.1.9 The set B = {0, 1} is a proper set of A = {0, 1, 2, 3}; that is, B ⊂ A.
♦
A = B A ⊆ B, B ⊆ A. (1.1.2)
In other words, two sets A and B are equal if and only if A ⊆ B and B ⊆ A.
As we can later see in the proof of Theorems 1.1.4 and 1.1.1 is useful especially
for proving the equality of two sets.
Definition 1.1.8 (empty set) A set with no point is called an empty set or a null set,
and is denote by ∅ or { }.
Note that the empty set ∅ = { } is different from the point set {0} composed of one
element 0. One interesting property of the empty set is shown in the theorem below.
Example 1.1.10 For the sets A = {0, 1, 2, 3} and B = {1, 5}, we have ∅ ⊆ A and
{ } ⊆ B. ♦
Definition 1.1.9 (finite set; infinite set) A set with a finite or an infinite number of
elements is called a finite or an infinite set, respectively.
Definition 1.1.10 (set of natural numbers; set of integers; set of real numbers) We
will often denote the sets of natural numbers, integers, and real numbers by
and
respectively.
Example 1.1.11 The set {1, 2, 3} is a finite set and the null set { } = ∅ is also a
finite set. The set {ω : ω is a natural number, 0 < ω < 10} is a finite set and {ω :
ω is a real number, 0 < ω < 10} is an infinite set. ♦
4 1 Preliminaries
Definition 1.1.11 (interval) An infinite set composed of all the real numbers between
two distinct real numbers is called an interval or an interval set.
Definition 1.1.12 (collection of sets) When all the elements of a ‘set’ are sets, the
‘set’ is called a set of sets, a class of sets, a collection of sets, or a family of sets.
A class, collection, and family of sets are also simply called class, collection,
and family, respectively. A collection with one set is called a singleton collection. In
some cases, a singleton set denotes a singleton collection similarly as a set sometimes
denotes a collection.
Example 1.1.14 When A = {1, 2}, B = {2, 3}, and C = { }, the set D = {A, B, C}
is a collection of sets. The set E = {(1, 2], [3, 4)} is a collection of sets. ♦
Example 1.1.15 Assume the sets A = {1, 2}, B = {2, 3}, C = {4, 5}, and D =
{{1, 2}, {4, 5}, 1, 2, 3}. Then, A ⊆ D, A ∈ D, B ⊆ D, B ∈
/ D, C D, and C ∈ D.
Here, D is a set but not a collection of sets. ♦
Example 1.1.16 The collection A = {{3}} and B = {{1, 2}} are singleton collec-
tions and C = {{1, 2}, {3}} is not a singleton collection. ♦
Definition 1.1.13 (power set) The class of all the subsets of a set is called the power
set of the set. The power set of Ω is denoted by 2Ω .
Example 1.1.17 The power set of Ω = {3} is 2Ω = {∅, {3}}. The power set of Ω =
{4, 5} is 2Ω = {∅, {4}, {5}, Ω}. For a set with n elements, the power set is a collection
of 2n sets. ♦
1.1 Set Theory 5
Definition 1.1.14 (complement) For an abstract space Ω and its subset A, the com-
plement of A, denoted by Ac or A, is defined by
Ac = {ω : ω ∈
/ A, ω ∈ Ω}. (1.1.6)
Figure 1.1 shows a set and its complement via a Venn diagram.
Example 1.1.18 It is easy to see that Ω c = ∅ and (B c )c = B for any set B. ♦
Example 1.1.19 For the abstract space Ω = {0, 1, 2, 3} and B = {0, 1}, we have
B c = {2, 3}. The complement of the interval1 A = (−∞, 1] is Ac = (1, ∞). ♦
Definition 1.1.15 (union) The union or sum, denoted by A ∪ B or A + B, of two
sets A and B is defined by
A∪B = A+B
= {ω : ω ∈ A or ω ∈ B}. (1.1.7)
That is, A ∪ B denotes the set of elements that belong to at least one of A and B.
Figure 1.2 shows the union of A and B via a Venn diagram. More generally, the
union of {Ai }i=1
n
is2 denoted by
n
∪ Ai = A1 ∪ A 2 ∪ · · · ∪ A n . (1.1.8)
i=1
1 Because an interval assumes the set of real numbers by definition, it is not necessary to specify
the abstract space when we consider an interval.
2 We often use braces also to denote a number of items in a compact way. For example, {A }n
i i=1
here represents A1 , A2 , . . .,An .
6 1 Preliminaries
A B
A B
That is, A ∩ B denotes the set of elements that belong to both A and B simultaneously.
The Venn diagram for the intersection of A and B is shown in Fig. 1.3. Meanwhile,
n
∩ Ai = A1 ∩ A 2 ∩ · · · ∩ A n (1.1.10)
i=1
Example 1.1.26 The sets C = {1, 2, 3} and D = {4, 5} are mutually exclusive. The
sets A = {1, 2, 3, 4} and B = {4, 5, 6} are not mutually exclusive because A ∩ B =
{4} = ∅. The intervals [1, 3) and [3, 5] are mutually exclusive, and [3, 4] and [4, 5]
are not mutually exclusive. ♦
and
(disjoint) : Ai ∩ A j = ∅ (1.1.12)
Example 1.1.27 When A = {1, 2}, the collection {{1}, {2}} is a partition of A.
Each of the five collections {{1}, {2}, {3}, {4}}, {{1}, {2}, {3, 4}}, {{1}, {2, 3}, {4}},
{{1, 2}, {3}, {4}}, and {{1, 2}, {3, 4}} is a partition of B = {1, 2, 3, 4} while neither
{{1, 2, 3}, {3, 4}} nor {{1, 2}, {3}} is a partition of B. ♦
Example 1.1.28 The collection {A, ∅} is a partition of A, and {[3, 3.3), [3.3, 3.4],
(3.4, 3.6], (3.6, 4)} is a partition of the interval [3, 4). ♦
Example 1.1.29 For A = {1, 2, 3}, obtain all the partitions without the null set.
A B
A − B = {ω : ω ∈ A and ω ∈
/ B}. (1.1.13)
A − B = A ∩ Bc
= A − AB. (1.1.14)
Example 1.1.30 For A = {1, 2, 3} and B = {0, 1}, we have A − B = {2, 3} and
B − A = {0}. The differences between the intervals [1, 3) and (2, 5] are [1, 3) −
(2, 5] = [1, 2] and (2, 5] − [1, 3) = [3, 5]. ♦
AB = (A − B) ∪ (B − A)
= A ∩ B c ∪ Ac ∩ B
= (A ∪ B) − (A ∩ B) . (1.1.15)
Figure 1.6 shows the symmetric difference AB via a Venn diagram.
Example 1.1.32 For A = {1, 2, 3, 4} and B = {4, 5, 6}, we have AB = {1, 2, 3} ∪
{5, 6} = {1, 2, 3, 4, 5, 6} − {4} = {1, 2, 3, 5, 6}. The symmetric difference between
the intervals [1, 3) and (2, 5] is [1, 3)(2, 5] = ([1, 3) − (2, 5]) ∪ ((2, 5] − [1, 3)) =
[1, 2] ∪ [3, 5]. ♦
1.1 Set Theory 9
A B
Example 1.1.33 For any set A, we have AA = ∅, A∅ = ∅A = A, AΩ =
ΩA = Ac , and AAc = Ac A = Ω. ♦
Example 1.1.35 (Sveshnikov 1968) Show that every element of A1 ΔA2 Δ · · · ΔAn
belongs to only an odd number of the sets {Ai }i=1
n
.
Interestingly, the set operation similar to the addition of numbers is not the union of
sets but rather the symmetric difference (Karatowski and Mostowski 1976): subtrac-
tion is the inverse operation for addition and symmetric difference is its own inverse
operation while no inverse operation exists for the union of sets. More specifically, for
two sets A and B, there exists only one C, which is C = AB, such that AC = B.
This is clear because A(AB) = B and AC = B.
Theorem 1.1.3 For the operations of union and intersection, the following laws
apply:
1. Commutative law
A∪B = B∪ A (1.1.16)
A∩B = B∩ A (1.1.17)
10 1 Preliminaries
2. Associative law
(A ∪ B) ∪ C = A ∪ (B ∪ C) (1.1.18)
(A ∩ B) ∩ C = A ∩ (B ∩ C) (1.1.19)
3. Distributive law
(A ∪ B) ∩ C = (A ∩ C) ∪ (B ∩ C) (1.1.20)
(A ∩ B) ∪ C = (A ∪ C) ∩ (B ∪ C) (1.1.21)
Note that the associative and distributive laws are for the same and different types
of operations, respectively.
Example 1.1.36 Assume three sets A = {0, 1, 2, 6}, B = {0, 2, 3, 4}, and C =
{0, 1, 3, 5}. It is easy to check the commutative and associative laws. Next, (1.1.20)
holds true as it is clear from (A ∪ B) ∩ C = {0, 1, 2, 3, 4, 6} ∩ {0, 1, 3, 5} = {0, 1, 3}
and (A ∩ C) ∪ (B ∩ C) = {0, 1} ∪ {0, 3} = {0, 1, 3}. In addition, (1.1.21) holds
true as it is clear from (A ∩ B) ∪ C = {0, 2} ∪ {0, 1, 3, 5} = {0, 1, 2, 3, 5} and
(A ∪ C) ∩ (B ∪ C) = {0, 1, 2, 3, 5, 6} ∩ {0, 1, 2, 3, 4, 5} = {0, 1, 2, 3, 5}. ♦
Generalizing (1.1.20) and (1.1.21) for a number of sets, we have
n n
B∩ ∪ Ai = ∪ (B ∩ Ai ) (1.1.22)
i=1 i=1
and
n n
B∪ ∩ Ai = ∩ (B ∪ Ai ) , (1.1.23)
i=1 i=1
respectively.
Theorem 1.1.4 When A1 , A2 , . . ., An are subsets of an abstract space S, we have
c
n n
∪ Ai = S− ∪ Ai
i=1 i=1
n
= ∩ (S − Ai )
i=1
n
= ∩ Aic (1.1.24)
i=1
and
c
n n
∩ Ai = ∪ Aic . (1.1.25)
i=1 i=1
1.1 Set Theory 11
c
n n
∪ Ai ⊆ ∩ Aic . (1.1.26)
i=1 i=1
n
Next, assume x ∈ ∩ Aic . Then, x ∈ Aic , and therefore x is not an element of Ai
i=1 c
n n
for any i: in other words, x ∈
/ ∪ Ai . This implies x ∈ ∪ Ai . Therefore, we
i=1 i=1
have
c
n n
∩ Aic ⊆ ∪ Ai . (1.1.27)
i=1 i=1
Example 1.1.37 Consider S = {1, 2, 3, 4} and its subsets A1 = {1}, A2 = {2, 3},
and A3 = {1, 3, 4}. Then, we have (A1 + A2 )c = {1, 2, 3}c = {4}, which is the same
as Ac1 Ac2 = {2, 3, 4} ∩ {1, 4} = {4}. Similarly, we have (A1 A2 )c = ∅c = S, which
is the same as Ac1 + Ac2 = {2, 3, 4} ∪ {1, 4} = S. In addition, (A2 + A3 )c = S c = ∅
is the same as Ac2 Ac3 = {1, 4} ∩ {2} = ∅. Finally, (A1 A3 )c = {1}c = {2, 3, 4} is the
same as Ac1 + Ac3 = {2, 3, 4} ∪ {2} = {2, 3, 4}. ♦
Example 1.1.39 The sets {1, 2, 3} and {1, 10, 100, 1000} are both countable because
a one-to-one correspondence can be established between these two sets and the sub-
sets {1, 2, 3} and {1, 2, 3, 4}, respectively, of J+ . ♦
Example 1.1.40 The set J of integers is countable because we can establish a one-
to-one correspondence as
0 −1 1 −2 2 · · · −n n · · ·
··· ··· (1.1.28)
1 2 3 4 5 · · · 2n 2n + 1 · · ·
between J and J+ . Similarly, it is easy to see that the sets {ω : ω is a positive even
number} and {2, 4, . . . , 2n , . . .} are countable sets by noting the one-to-one corre-
spondences 2n ↔ n and 2n ↔ n, respectively. ♦
0
i =0:
1
1 1
i =1:− ,
1 1
(1.1.31)
2 1 1 2
i =2:− ,− , ,
1 2 2 1
3 2 1 1 2 3
i =3:− ,− ,− , , ,
1 2 3 3 2 1
..
.
Reading this sequence downward from the first row and ignoring repetitions, we will
have a one-to-one correspondence between the sets of rational numbers and natural
numbers.
(Method 2) Assume integers x = 0 and y, and denote the rational number xy
by the coordinates (x, y) on a two dimensional plane. Reading the integer coordinates
as (1, 0) → (1, 1) → (−1, 1) → (−1, −1) → (2, −1) → (2, 2) → · · · while
skipping a number if it had previously appeared, we have a one-to-one correspon-
dence between J+ and Q. ♠
Proof (1) For a finite set, it is obvious. For an infinite set, denote a countable set by
A = {a1 , a2 , . . .}. Then, a subset of A can be expressed as B = an 1 , an 2 , . . .
and we can find a one-to-one correspondence i ↔ ani between J+ and B.
(2) We can choose a countable subset {a1 , a2 , . . .} arbitrarily from an infinite set.
(3) Consider the sequence B1 , B2 , . . . defined as
B1 = A1
= b1 j , (1.1.32)
B2 = A2 − A1
= b2 j , (1.1.33)
B3 = A3 − (A1 ∪ A2 )
= b3 j , (1.1.34)
..
.
14 1 Preliminaries
∞ ∞
Clearly, B1 , B2 , . . . are mutually exclusive and ∪ Ai = ∪ Bi . Because Bi ⊆ Ai ,
i=1 i=1
the sets B1 , B2 , . . . are all countable from Property (1). Next, arrange the elements
of B1 , B2 , . . . as
and read them in the order as directed by the arrows, which represents a one-to-
∞ ∞
one correspondence between J+ and ∪ Bi = ∪ Ai .
i=1 i=1
♠
Property (3) of Theorem 1.1.6 also implies that a countably infinite union of
countable sets is a countable set.
Example 1.1.41
(Sveshnikov 1968) Show that the Cartesian product A1 × A2 ×
· · · × An = a1i1 , a2i2 , . . . , anin of a finite number of countable sets is countable.
Solution It suffices to show that the Cartesian product A × B is countable when
A and B are countable. Denote two countable sets by A = {a1 , a2 , . . .} and
B = {b1 , b2 , . . .}. If we arrange the elements of the Cartesian product A × B as
(a1 , b1 ) , (a1 , b2 ) , (a2 , b1 ) , (a1 , b3 ) , (a2 , b2 ) , (a3 , b1 ) , . . ., then it is apparent that
the Cartesian product is countable. ♦
Example 1.1.42 Show that the set of finite sequences from a countable set is count-
able.
Solution The set Bk of finite sequences with length k from a countable set A is
equivalent to the k-fold Cartesian product Ak = (b1 , b2 , . . . , bk ) : b j ∈ A of A.
Then, Bk is countable from Example 1.1.41. Next, the set of finite sequences is the
∞
countable union ∪ Bk , which is countable from (3) of Theorem 1.1.6. ♦
k=1
· · · 00 · · · → 1, · · · 11 · · · → 2, · · · 0101 · · · → 3,
· · · 1010 · · · → 4, · · · 001001 · · · → 5, · · · 010010 · · · → 6,
· · · 100100 · · · → 7, · · · 011011 · · · → 8, · · · 101101 · · · → 9, · · ·
between ΥT and J+ . ♦
As it has already been mentioned, finite sets are all countable. On the other hand,
some infinite sets are countable and some are uncountable.
Theorem 1.1.7 The interval set [0, 1] = R[0,1] = {x : 0 ≤ x ≤ 1}, i.e., the set of
real numbers in the interval [0, 1], is uncountable.
Proof We prove the theorem by contradiction. Letting ai j ∈ {0, 1, . . . , 9}, the ele-
ments of the set R[0,1] can be expressed as 0.ai1 ai2 · · · ain · · · . Assume R[0,1] is
countable: in other words, assume all the elements of R[0,1] are enumerated as
where ai j ∈ {0, 1}. Denote the complement of a binary digit x by x. Then, the
sequence (a11 a22 · · · ) produces a contradiction. Therefore, Υ is uncountable. ♦
1 2
k−1
ζ1k = k
+ cj (1.1.41)
3 j=0
3j
2 2
k−1
ζ2k = k
+ cj (1.1.42)
3 j=0
3j
The Cantor set C described in Example 1.1.46 has the following properties:
∞
(1) The set C can be expressed as C = ∩ Bi , where B1 = [0, 1], B2 = 0, 13 ∪
2
i=1
3
, 1 , B3 = 0, 19 ∪ 29 , 39 ∪ 69 , 79 ∪ 89 , 1 , . . ..
(2) The set C is an uncountable and closed set.
(3) The length of the union of the open intervals removed when obtaining C is
1
1
3
+ 322 + 343 + · · · = 1−3 2 = 1. Consequently, the length of C is 0.
3
(4) The set C is the set of ternary real numbers between 0 and 1 that can be repre-
sented without using 1. In other words, every element of C can be expressed as
∞
xn
3n
, xn ∈ {0, 2}.
n=1
In Sect. 1.3.3, the Cantor set is used as the basis for obtaining a singular function.
Example 1.1.47 (Gelbaum and Olmsted 1964) The Cantor set C considered in
Example 1.1.46 has a length 0. A Cantor set with a length greater than 0 can be
obtained similarly. For example, consider
[0, 1] and a constant α ∈ (0, 1].
the interval
In the first step, an open interval 21 − α4 , 21 + α4 of length α2 is removed. In the
second step, an open interval each of length α8 is removed at the center of the two
1.1 Set Theory 17
α
closed intervals remaining. In the third step, an open interval each of length 32 is
removed at the center of the four closed intervals remaining. . . .. Then, this Cantor
set is a set of length 1 − α because the sum of lengths of the regions removed is
α
2
+ α4 + α8 + · · · = α. A Cantor set with a non-zero length is called the Smith-
Volterra-Cantor set or fat Cantor set. ♦
Example 1.1.48 As shown in Table 1.1, the term countable set denotes a finite set
or a countably infinite set, and the term infinite set denotes a countably infinite set
or an uncountably infinite, simply called uncountable, set. ♦
Definition 1.1.24 (almost everywhere) In real space, when the length of the union
of countably many intervals is arbitrarily small, a set of points that can be contained
in the union is called a set of length 0. In addition, ‘at all points except for a set
of length 0’ is called ‘almost everywhere’, ‘almost always’, ‘almost surely’, ‘with
probability 1’, ‘almost certainly’, or ‘at almost every point’.
In the integer or discrete space (Jones 1982), ‘almost everywhere’ denotes ‘all
points except for a finite set’.
Example 1.1.49 The intervals [1, 2) and (1, 2) are the same almost everywhere.
The sets {1, 2, . . .} and {2, 3, . . .} are the same almost everywhere. ♦
Example 1.1.50 For the two sets A = {4, 2, 1, 9} and B = {8, 0, 4, 5}, we have
A ∼ B. ♦
g(x), x ∈ A0 ∪ B,
h(x) = (1.1.43)
x, x∈/ A0 ∪ B
Example 1.1.53 The set J+ of natural numbers is equivalent to the set J of integers.
The set of irrational numbers is equivalent to the set R of real numbers from Exercise
1.16. ♦
Example 1.1.54 It is interesting to note that the set of irrational numbers is not
closed under certain basic operations, such as addition and multiplication, while the
much smaller set of rational numbers is closed under such operations. ♦
Example 1.1.57 The set of real numbers R is uncountable. The intervals [a, b],
[a, b), (a, b], and (a, b) are all uncountable for any real number a and b > a from
Theorem 1.1.7 and Example 1.1.56. ♦
1.2 Functions
In this section, we will introduce and briefly review some key concepts within the
theory of functions (Ito 1987; Royden 1989; Stewart 2012).
Definition 1.2.1 (mapping) A relation f that assigns every element of a set Ω with
only one element of another set A is called a function or mapping and is often denoted
by f : Ω → A.
For the function f : Ω → A, the sets Ω and A are called the domain and
codomain, respectively, of f .
Example 1.2.1 Assume the domain Ω = [−1, 1] and the codomain A = [−2, 1].
The relation that connects all the points in [−1, 0) of the domain with −1 in the
codomain, and all the points in [0, 1] of the domain with 1 in the codomain is a
function. ♦
Example 1.2.2 Assume the domain Ω = [−1, 1] and the codomain A = [−2, 1].
The relation that connects all the points in [−1, 0) of the domain with −1 in the
codomain, and all the points in (0, 1] of the domain with 1 in the codomain is not
a function because the point 0 in the domain is not connected with any point in
the codomain. In addition, the relation that connects all the points in [−1, 0] of the
domain with −1 in the codomain, and all the points in [0, 1] of the domain with 1 in
the codomain is not a function because the point 0 in the domain is connected with
more than one point in the codomain. ♦
Definition 1.2.2 (set function) A function whose domain is a collection of sets is
called a set function.
Example 1.2.3 Let the domain be the power set 2C = {∅, {3}, {4}, {5}, {3, 4},
{3, 5}, {4, 5}, {3, 4, 5}} of C = {3, 4, 5}. Define a function f (B) for B ∈ 2C as
the number of elements in B. Then, f is a set function, and we have f ({3}) = 1,
f ({3, 4}) = 2, and f ({3, 4, 5}) = 3, for example. ♦
Ω f (Ω)
Definition 1.2.4 (range) For a function f : Ω → A, the image f (Ω) is called the
range of the function f .
The image f (G) of G ⊆ Ω and the range f (Ω) are shown in Figs. 1.8 and 1.9,
respectively.
Example 1.2.4 For the domain Ω = [−1, 1] and the codomain A = [−10,
10], con-
sider the function f (ω) = ω 2 . The image of the subset G 1 = − 21 , 21 of the domain
Ω is f (G 1 ) = [0, 0.25), and the image of G 2 = (0.1, 0.2) is f (G 2 ) = (0.01, 0.04).
♦
Example 1.2.5 The image of G = {{3}, {3, 4}} in Example 1.2.3 is f (G) = {1, 2}.
♦
Example 1.2.6 Consider the domain Ω = [−1, 1] and codomain A = [−2, 1].
Assume a function f for which all the points in [−1, 0) ⊆ Ω are mapped to
−1 ∈ A and all the points in [0, 1] ⊆ Ω are mapped to 1 ∈ A. Then, the range
f (Ω) = f ([−1, 1]) of f is {−1, 1}, which is different from the codomain A. In
Example 1.2.3, the range of f is {0, 1, 2, 3}. ♦
As we observed in Example 1.2.6, the range and codomain are not necessarily the
same.
f −1 (H ) = {ω : f (ω) ∈ H }, (1.2.2)
f −1 (H) H
Example 1.2.7 Consider the function f (ω) = ω 2 with domain Ω = [−1, 1] and
codomain A = [−10, 10]. The inverse image of a subset H1 = (−0.25, 1) of codomain
A is f −1 (H1 ) = (−1, 1), and the inverse image of H2 = (−0.25, 0) is f −1 (H2 ) =
f −1 ((−0.25, 0)) = ∅. ♦
Definition 1.2.6 (surjection) When the range and codomain of a function are the
same, the function is called an onto function, a surjective function, or a surjection.
If the range and codomain of a function are not the same, that is, if the range is a
proper subset of the codomain, then the function is called an into function.
Definition 1.2.7 (injection) When the inverse image for every element of the
codomain of a function has at most one element, i.e., when the inverse image for
every element of the range of a function has only one element, the function is called
an injective function, a one-to-one function, a one-to-one mapping, or an injection.
In Definition 1.2.7, ‘... function has at most one element, i.e., ...’ can be replaced
with ‘... function is a null set, a singleton set, or a singleton collection of sets, i.e., ...’,
and ‘... has only one element, ...’ with ‘... is a singleton set or a singleton collection
of sets, ...’.
Example 1.2.9 For the domain Ω = [−1, 1] and the codomain A = [0, 1], consider
the function f (ω) = ω 2 . Then, f is a surjective function because its range is the same
as the codomain, and f is not an injective function because, for any non-zero point
of the range, the inverse image has two elements. ♦
Example 1.2.10 For the domain Ω = [−1, 1] and the codomain A = [−2, 2], con-
sider the function f (ω) = ω. Then, because the range [−1, 1] is not the same as the
codomain, the function f is not a surjection. Because the inverse image of every
element in the range is a singleton set, the function f is an injection. ♦
Example 1.2.11 For the domain Ω = {{1}, {2, 3}} and the codomain A = {3, {4},
{5, 6, 7}}, consider the function f ({1}) = 3, f ({2, 3}) = {4}. Because the range
{3, {4}} is not the same as the codomain, the function f is not a surjection. Because
22 1 Preliminaries
the inverse image of every element in the range have only one4 element, the function
f is an injection. ♦
and
for subsets A and B of the domain and subsets C and D of the range.
Proof Let us show (1.2.5) only. First, when x ∈ f −1 (C ∪ D), we have f (x) ∈ C
or f (x) ∈ D. Then, because x ∈ f −1 (C) or x ∈ f −1 (D), we have x ∈ f −1 (C) ∪
4 The inverse image of the element {4} of the range is not {2, 3} but {{2, 3}}, which has only one
element {2, 3}.
1.2 Functions 23
and
m m
−1
f ∩ Ci = ∩ f −1 (Ci ) (1.2.10)
i=1 i=1
Example 1.2.13 For two elements a and b in the set R of real numbers, assume
the function d(a, b) = |a − b|. Then, we have |a − b| = |b − a|, |a − b| > 0 when
a = b, and |a − b| = 0 when a = b. We also have |a − c| + |c − b| ≥ |a − b| from
(|α| + |β|)2 − |α + β|2 = 2 (|α||β| − αβ) ≥ 0 for real numbers α and β. Therefore,
the function d(a, b) = |a − b| is a distance function. ♦
Example 1.2.14 For two elements a and b in the set R of real numbers, assume
the function d(a, b) = (a − b)2 . Then, we have (a − b)2 = (b − a)2 , (a − b)2 > 0
when a = b, and (a − b)2 = 0 when a = b. Yet, because (a − c)2 + (c − b)2 =
24 1 Preliminaries
−1 0 1 x
(a − b)2 + 2(c − a)(c − b) < (a − b)2 when a < c < b, the function d(a, b) =
(a − b)2 is not a distance function. ♦
Definition 1.2.12 (limit point; closure) A point p is called a limit point of a subset
E of a metric space if E contains at least one point different from p for every
neighborhood of p. The union Ē = E ∪ E L of E and the set E L of all the limit
points of E is called the closure or enclosure of E.
Definition 1.2.13 (support) The closure of the set {x : f (x) = 0} is called the sup-
port of the function f (x).
Example 1.2.18 The value of the function f (x) = sin x is 0 when x = nπ for n
integer. Yet, the support of f (x) is the set R of real numbers. ♦
1.3 Continuity of Functions 25
Definition 1.3.1 (continuous function) If, for every positive number and every
point x0 in a region S, there exists a positive number δ (x0 , ) such that
| f (x) − f (x0 )| < for all points x in S when |x − x0 | < δ (x0 , ), then the function
f is called continuous on S.
In other words, for some point x0 in S and some positive number , if there exists
at least one point x in S such that |x − x0 | < δ (x0 , ) yet | f (x) − f (x0 )| ≥ for
every positive number δ (x0 , ), the function f is not continuous on S.
Definition 1.3.2 (uniform continuity) If, for every positive number , there exists a
positive number δ () such that | f (x) − f (x0 )| < for all points x and x0 in a region
S when |x − x0 | < δ (), then the function f is called uniformly continuous on S.
In other words, if there exist at least one each of x and x0 in S for a positive number
such that |x − x0 | < δ () yet | f (x) − f (x0 )| ≥ for every positive number δ (),
then the function f is not uniformly continuous on S.
The difference between uniform continuity and continuity lies in the order of
choosing the numbers x0 , δ, and . Specifically, for continuity, x0 and are chosen
first and then δ (x0 , ) is chosen, and thus δ (x0 , ) is dependent on x0 and . On the
other hand, for uniform continuity, is chosen first, δ () is chosen next, and then x0
is chosen last, in which δ () is dependent only on and not on x or x0 . In short, the
dependence of δ on x0 is the key difference.
When a function f is uniformly continuous, we can make f (x1 ) arbitrarily close
to f (x2 ) for every two points x1 and x2 by moving these two points together. A
uniformly continuous function is always a continuous function, but a continuous
function is not always uniformly continuous. In other words, uniform continuity is
a stronger or more strict concept than continuity.
Example 1.3.2 For the function f (x) = x in S = R, let δ = with > 0. Then,
when |x − y| < δ, because | f (x) − f (y)| = |x − y| < δ = , f (x) √ = x is uni-
formly continuous. As shown in Exercise 1.23, the function f (x) = x is uniformly
continuous on the interval (0, ∞). The function f (x) = x1 is uniformly continuous
for all intervals (a, ∞) with a > 0. On the other hand, as shown in Exercise 1.22,
it is not uniformly continuous on the interval (0, ∞). The f (x) = tan x is
function
continuous but not uniformly continuous on the interval − π2 , π2 . ♦
Example 1.3.3 In the interval S = (0, ∞), consider f(x) = x 2 . For a positive num-
ber and a point x0 in S, let a = x0 + 1 and δ = min 1, 2a . Then, for a point x in
S, when |x − x0 | < δ, we have |x − x0 | < 1 and x < x0 + 1 = a because δ ≤ 1. We
also have x0 < a. Now, we have x 2 − x02 = (x + x0 ) |x − x0 | < 2aδ ≤ 2a 2a =
because δ ≤ 2a , and thus f is continuous. On the other hand, let = 1, assume a pos-
itive number δ, and
choose x0 = 1δ and x = x0 + 2δ . Then, we have |x − x0 | = 2δ < δ
2
but x − x02 = 1δ + 2δ − δ12 = 1 + δ4 > 1 = , implying that f is not uniformly
2 2
In Theorem 1.3.2, the inequality (1.3.2) and the number M are called the Lipschitz
inequality and Lipschitz constant, respectively.
1.3 Continuity of Functions 27
Theorem 1.3.3 If a function is differentiable and has a bounded derivative, then the
function is uniformly continuous.
1.3.2 Discontinuities
Definition 1.3.3 (type 1 discontinuity)
+When
the three
values f x + , f x − , and
f (x) all exist and at least one of f x and f x − is different from f (x), the
point x is called
a type1 discontinuity
point or a jump discontinuity point, and the
difference f x + − f x − is called the jump or saltus of f at x.
Definition 1.3.4 (type 2 discontinuity) If at least one of f x + and f x − does not
exist, then the point x is called a type 2 discontinuity point.
0 x
−1
28 1 Preliminaries
−1
is5 continuous almost everywhere: that is, f is continuous at all points except at
rational numbers. The discontinuities are all type 2 discontinuities. ♦
n
| f (bk ) − f (ak )| < ε (1.3.9)
k=1
n
when |bk − ak | < δ for every positive numbers c and ε, then the function f is
k=1
called an absolutely continuous function.
Example 1.3.10 The functions f 1 (x) = x 2 and f 2 (x) = sin x are both absolutely
continuous, and f 3 (x) = x1 is absolutely continuous for x > 0. ♦
If a function f (x) is absolutely continuous on a finite interval (a, b), then there
exists an integrable function f (x) satisfying
b
f (b) − f (a) = f (x)d x, (1.3.10)
a
where −∞ < a < b < ∞ and f (x) is the derivative of f (x) at almost every point.
The converse also holds true. Note that, if f (x) is not absolutely continuous, the
derivative does not satisfy (1.3.10) even when the derivative of f (x) exists at almost
every point.
Example 1.3.11 Denote by Di j the j-th interval that is removed at the i-th step in
the procedure of obtaining the Cantor set C in Example 1.1.46, where i = 1, 2, . . .
and j = 1, 2, . . . , 2i−1 . Draw 2n − 1 line segments
2j − 1
y= : x ∈ Di j ; j = 1, 2, . . . , 2i−1 ; i = 1, 2, . . . , n (1.3.11)
2i
30 1 Preliminaries
φ1 (x) φ2 (x)
1 1
3
−
4
1 1
− −
2 2
1
−
4
1 2 x 1 2 3 6 7 8 x
−
3
−
3 1 − −−
9 9 9
− −−
9 9 9 1
Fig. 1.14 The first two functions φ1 (x) and φ2 (x) of {φn (x)}∞
n=1 converging to the Cantor function
φC (x)
parallel to the x-axis on an (x, y) coordinate plane. Next, draw a straight line each
from the point (0, 0) to the left endpoint of the nearest line segment and from the point
(1, 1) to the right endpoint of the nearest line segment. For every line segment, draw a
straight line from the right endpoint to the left endpoint of the nearest line segment on
the right-hand side. Let the function resulting from this procedure be φn (x). Then,
φn (x) is continuous on the interval
(0, 1), and is composed of 2n line segments
−n 3 n
of height 2 and slope 2 connected with 2n − 1 horizontal line segments.
Figure 1.14 shows φ1 (x) and φ2 (x). The limit
x = 0.c1 c2 · · · (1.3.14)
in a ternary number. Now, the image of φC (x) is a subset of [0, 1]. In addition,
because the number
(1) The Cantor function φC (x) is a non-decreasing function with range [0, 1] and
no jump discontinuity. Because there can be no discontinuity except for jump
discontinuities in non-increasing and non-decreasing functions, φC (x) is a con-
tinuous function.
(2) Let E be the set of points represented by (1.1.41) and (1.1.42). Then, the function
φC (x) is an increasing function at x ∈ C − E and is constant in some neighbor-
hood of every point x ∈ [0, 1] − C.
(3) As observed in Example 1.1.46, the length of [0, 1] − C is 1, and φC (x) is
constant at x ∈ [0, 1] − C. Therefore, the derivative of φC (x) is 0 almost every-
where.
Example 1.3.12 (Salem 1943) The Cantor function φC (x) considered in Example
1.3.11 is a non-decreasing singular function. Obtain an increasing singular function.
Solution
Consider the line segment P Q connecting P(x, y) and Q (x + Δx , y+
Δ y on a two-dimensional plane, where Δx > 0 and Δ y > 0. Let the point R have
the coordinate x + Δ2x , y + λ0 Δ y with 0 < λ0 < 1. Denote the replacement of the
line segment P Q into two line segments P R and R Q by ‘transformation of P Q via
T (λ0 )’. Now, starting from the line segment O A between the origin O(0, 0) and the
point A(1, 1), consider a sequence { f n (x)}∞
n=0 defined by
m
k−1
y = θk λθ j , (1.3.16)
k=1 j=1
32 1 Preliminaries
f0 (x) f1 (x)
1 1
1 x 1 x
1
2
f2 (x) f3 (x)
1 1
1
4
2
4
3
4 1 x 1 2 3 4 5 6 7
8 8 8 8 8 8 8 1 x
k−1
where λ1 = 1 − λ0 and we assume λθ j = 1 when k = 1. The limit f (x) =
j=1
lim f m (x) of the sequence { f m (x)}∞
m=1 is an increasing singular function. ♦
m→∞
Let us note that the convolution6 of two absolutely continuous functions always
results in an absolutely continuous function while the convolution of two singu-
lar functions may sometimes result not in a singular function but in an absolutely
continuous function (Romano and Siegel 1986).
In this section, we describe the properties of unit step, impulse (Challifour 1972;
Gardner 1990; Gelfand and Moiseevich 1964; Hoskins and Pinto 2005; Kanwal 2004;
Lighthill 1980), and gamma functions (Artin 1964; Carlson 1977; Zayed 1996) in
detail.
∞ ∞
6The integral −∞ g(x − v) f (v)dv = −∞ g(v) f (x − v)dv is called the convolution of f and g,
and is usually denoted by f ∗ g or g ∗ f .
1.4 Step, Impulse, and Gamma Functions 33
is called the unit step function, step function, or Heaviside function and is also
denoted by H (x).
In (1.4.1), the value u(0) is not defined: usually, u(0) is chosen as 0, 21 , 1, or any
value u 0 between 0 and 1. Figure 1.16 shows the unit step function with u(0) = 21 .
In some cases, the unit step function with value α at x = 0 is denoted by u α (x), with
u − (x), u(x), and u + (x) denoting the cases of α = 0, 21 , and 1, respectively. The unit
step function can be regarded as the integral of the impulse or delta function that will
be considered in Sect. 1.4.2.
The unit step function u(x) with u(0) = 21 can be represented as the limit
1 1
u(x) = lim + tan−1 (αx) (1.4.2)
α→∞ 2 π
or
1
u(x) = lim (1.4.3)
α→∞ 1 + e−αx
As we have observed in (1.4.2) and (1.4.3), the unit step function can be defined
alternatively by first introducing step-convergent sequence, also called the Heaviside
0 x
34 1 Preliminaries
a sequence {h m (x)}∞
m=1 of real functions that satisfy
∞
lim f (x), h m (x) = f (x)d x
m→∞ 0
= f (x), u(x) (1.4.6)
for every sufficiently smooth function f (x) in the interval −∞ < x < ∞ is called
a step-convergent sequence or a step sequence, and its limit
and
u x 2 = u (|x|)
u(0), x = 0,
= (1.4.10)
1, x = 0
7 When we also take complex functions into account, the notation a(x), b(x) is defined as
∞
a(x), b(x) = −∞ a(x)b∗ (x)d x.
8 In (1.4.8), it is implicitly assumed u(0) = 0 or 1.
1.4 Step, Impulse, and Gamma Functions 35
In addition, we have
t
min(t, s) = u(s − y)dy (1.4.12)
0
t, t ≥ s,
for t ≥ 0 and s ≥ 0. Similarly, the max function max(t, s) = can be
s, t ≤ s
expressed as
for t ≥ 0 and s ≥ 0. ♦
The unit step function is also useful in expressing piecewise continuous functions
as single-line formulas.
∞ 1
shown in Fig. 1.17. Then, we have lim −∞ f (x)h m (x)d x = lim 2m
− 2m
1
m→∞ m→∞
1
∞ ∞
mx + 21 f (x)d x + 1 f (x)d x = 0 f (x)d x = f (x), u(x) because −2m1
2m
2m1
2m
The unit step function we have described so far is defined in the continuous space.
In the discrete space, the unit step function can similarly be defined.
36 1 Preliminaries
1
− 2m 0 1 x
2m
Note that, unlike the unit step function u(x) in continuous space for which the
value u(0) is not defined uniquely, the value ũ(0) is defined uniquely as 1. In addition,
for any non-zero real number a, u(|a|x) is equal to u(x) except possibly at x = 0
while ũ(|a|x) and ũ(x) are different9 at infinitely many points when |a| < 1.
1.4.2.1 Definitions
0 2 t 0 1 t 0 0.5 t
shown in Fig. 1.19, is not a continuous function, and therefore not differentiable.
Yet, it is continuous and differentiable everywhere except at t = 0 and a. In addition,
the rectangular function is the derivative of the ramp function almost everywhere:
specifically, pa (t) = dtd ra (t) for t = 0, a. As we can observe in Fig. 1.20, the limit
of the ramp function ra (t) for a → 0 is the unit step function.
Consider the derivative of the unit step function u(t), the limit of the ramp
function ra (t). The order of operations is not always interchangeable in gen-
eral: yet, we interchange the order of the derivative and limit of ra (t). Specif-
ically, as shown in Figs. 1.21 and 1.22, the limit of the derivative of ra (t) can
be regarded as the derivative of the limit of ra (t). In other words, we can imag-
0 2 t 0 1 t 0 0.5 t
0 1 t 0 0.5 t 0 t
Fig. 1.20 Limit of the sequence {ra (t)} of ramp functions: from a ramp function to the unit step
function
38 1 Preliminaries
0 a t 0 t 0 t
limit differentiation
Fig. 1.21 Ramp function −→ step function −→ impulse function
ine d
dt
u(t) = d
dt
lim ra (t) = lim d
r (t)
dt a
= lim pa (t). Based on this simple yet
a→0 a→0 a→0
useful description, let us introduce the impulse function in more detail.
Definition 1.4.3 (impulse function) The derivative
du(x)
δ(x) = (1.4.20)
dx
of the unit step function u(x) is called an impulse function or a generalized function.
As we have already observed, the unit step function u(x) is not continuous at
x = 0 and therefore not differentiable. This implies that (1.4.20) is technically not
defined at x = 0: (1.4.20) can then be interpreted as “Let us define the ‘symbolic’
differentiation of u(x) as δ(x).” Clearly, the impulse function δ(x) is not defined at
x = 0: it is often assumed that10 δ(0) → ∞.
Let us next consider the second definition of the impulse function.
Definition 1.4.4 (impulse function) A function δ(x) satisfying the conditions
and
β
1, if α < c < β,
δ(x − c)d x = (1.4.22)
α 0, if α = β = c, c < α < β, or α < β < c
0 a t 0 a t 0 t
differentiation limit
Fig. 1.22 Ramp function −→ rectangular function −→ impulse function
for instance. Based on this observation, we can define the impulse function via a
proper sequence of functions. Let us first introduce the concept of impulse-convergent
sequence similarly as we introduced the step-convergent sequence.
Definition 1.4.5 (impulse-convergent sequence) When
∞
lim f (x)sm (x)d x = f (0) (1.4.25)
m→∞ −∞
is satisfied for every sufficiently smooth function f (x) over the interval −∞ < x <
∞, the sequence {sm (x)}∞ m=1 is called an impulse sequence, an impulse-convergent
sequence, or a delta-convergent sequence.
sin mx 1 m ∞ m −m|x| ∞ ∞
Example 1.4.6 The sequences πx
, π 1+m 2 x 2 m=1
, e ,
∞
m=1 2 m=1
2 ∞
m
π(emx +e−mx ) m=1
, and m
π
exp −mx m=1 are all impulse-convergent sequences.
♦
Example 1.4.7 The sequences {sm (x)}∞
m=1 of functions with
1
sm (x) = m u − |x| , (1.4.26)
2m
⎧
⎨ −m, |x| < 2m 1
,
sm (x) = 2m, 2m ≤ |x| ≤
1 1
, (1.4.27)
⎩ m
0, |x| > m1 ,
and
(2m + 1)! m
sm (x) = 1 − x 2 u(1 − |x|) (1.4.28)
2 2m+1 (m!) 2
and
⎧ 2
⎨ m x + m, − m1 ≤ x ≤ 0,
sm (x) = −m 2 x + m, 0 ≤ x ≤ m1 , (1.4.30)
⎩
0, |x| > m1
1.4.2.2 Properties
We have
and
1.4 Step, Impulse, and Gamma Functions 41
1 x
δ(ax), φ(x) = δ(x), φ (1.4.35)
|a| a
∂2
min(t, s) = δ(s − t)
∂s∂t
= δ(t − s) (1.4.37)
∂2
max(t, s) = −δ(s − t)
∂s∂t
= −δ(t − s) (1.4.38)
Let us next introduce the concept of a test function and then consider the product
of a function and the n-th order derivative δ (n) (x) of the impulse function.
Definition 1.4.7 (test function) A real function φ satisfying the two conditions below
is called a test function.
(1) The function φ(x) is differentiable infinitely many times at every point x =
(x1 , x2 , . . . , xn ).
(2) There exists a finite number A such that φ(x) = 0 for every point x =
(x1 , x2 , . . . , xn ) satisfying x12 + x22 + · · · + xn2 > A.
−a 0 a x
n
f (x)δ (n) (x − b) = (−1)n (−1)k n Ck f (n−k) (b)δ (k) (x − b), (1.4.39)
k=0
∞
f (x)δ (n) (x), φ(x) = { f (x)φ(x)} δ (n−1) (x) −∞
∞
− { f (x)φ(x)} δ (n−1) (x)d x
−∞
..
.
∞
= (−1) n
{ f (x)φ(x)}(n) δ(x)d x (1.4.40)
−∞
n
using (1.4.33) because { f (x)φ(x)}(n) = n Ck f (n−k) (x)φ(k) (x). The result (1.4.41)
k=0
is the same as the symbolic expression (1.4.39). ♠
The result (1.4.39) implies that the product of a sufficiently smooth function f (x)
n
and δ (n) (x − b) can be expressed as a linear combination of δ (k) (x − b) k=0 with
1.4 Step, Impulse, and Gamma Functions 43
the coefficient of δ (k) (x − b) being the product of the number (−1)n−k n Ck and12 the
value f (n−k) (b) of f (n−k) (x) at x = b.
when n = 0. ♦
and
Example 1.4.15 From (1.4.43), we get δ (x) sin x = −(cos 0)δ(x) + (sin 0)δ (x)
= −δ(x). ♦
Theorem 1.4.2 For non-negative integers m and n, we have x m δ (n) (x) = 0 and
x m δ (n) (x) = (−1)m (n−m)!
n!
δ (n−m) (x) when m > n and m ≤ n, respectively.
n
δ (x − xm )
δ ( f (x)) = , (1.4.46)
m=1
| f (xm )|
Proof Assume that function f has one real simple zero x1 , and consider a sufficiently
small interval Ix1 = (α, β) with α < x1 < β. Because x1 is the simple zero of f , we
have f (x1 ) = 0. If f (x1 ) > 0, then f (x) increases from f (α) to f (β) as x moves
from α to β. Consequently, u ( f (x)) = u (x − x1 ) and ddx u ( f (x)) = δ (x − x1 )
f (x)) d f (x)
on the interval Ix1 . On the other hand, we have ddx u ( f (x)) = du( =
d f (x) dx
d f (x)
δ( f (x)) d x = δ( f (x)) f (x1 ). Thus, we get
{x: f (x)=0}
δ (x − x1 )
δ( f (x)) = . (1.4.47)
f (x1 )
δ (x − x1 )
δ( f (x)) = . (1.4.48)
| f (x1 )|
Example 1.4.17 Based on (1.4.46), it can be shown that δ((x − a)(x − b)) =
1
{δ(x − a) + δ(x − b)} when b > a, δ(tan x) = δ(x) when − π2 < x < π2 , and
b−a
δ(cos x) = δ x − π2 when 0 < x < π. ♦
For the function13 δ ( f (x)) = dδ(v)
dv
or
v= f (x)
dδ( f (x))
δ ( f (x)) = , (1.4.50)
d f (x)
n ! "
1 δ (x − xm ) f (xm )
δ ( f (x)) = + δ (x − x m ) , (1.4.51)
m=1
| f (xm )| f (xm ) { f (xm )}2
δ (x − xm ) f (xm ) 1
= δ (x − xm ) + δ (x − xm ) (1.4.52)
f (x) { f (xm )}
2 f (xm )
# $
13 Note that g ( f (x)) = dg(y)
dy = d
dx g( f (x)). In other words, g ( f (x)) denotes
# $ y= f (x)
dg(y) dg( f )
dy or d f , but not d x g( f (x)).
d
y= f (x)
1.4 Step, Impulse, and Gamma Functions 45
(x)
from (1.4.43) because 1
f (x) = − { ff (x)} 2 = − { ff (x(xm)}) 2 . Recollect also
x=xm m
x=xm
that
d n
δ (x − xm )
δ ( f (x)) = (1.4.53)
dx m=1
| f (xm )|
dδ( f ) d x dδ( f )
from (1.4.46). Then, because δ ( f ) = df
= d f dx
, we have
1 δ (x − xm )
n
δ ( f (x)) = (1.4.54)
f (x) m=1 | f (xm )|
n ! "
1 f (xm ) δ (x − xm )
δ ( f (x)) = δ (x − x m ) + , (1.4.55)
m=1
| f (xm )| { f (xm )}2 f (xm )
When the real simple zeroes of a sufficiently smooth function f (x) are {xm }nm=1 ,
Theorem 1.4.3 indicates that δ( f (x)) can be expressed as a linear combination
of {δ (x − xm )}nm=1 and Theorem 1.4.4 similarly indicates that δ ( f (x)) can be
n
expressed as a linear combination of δ (x − xm ) , δ (x − xm ) m=1 .
Example 1.4.18 The function f (x) = (x − 1)(x − 2) has two simple zeroes x =
1 and 2. We thus have δ ((x − 1)(x − 2)) = 2δ(x − 1) + 2δ(x − 2) − δ (x − 1) +
δ (x − 2) because f (1) = −1, f (1) = 2, f (2) = 1, and f (2) = 2. ♦
Example 1.4.19 The function f (x) = sinh 2x has one simple zero x = 0. Then, we
get δ (sinh 2x) = 21 21 δ (x) + 0 = 14 δ (x) from f (0) = 2 cosh 0 = 2 and f (0) =
4 sinh 0 = 0. ♦
In this section, we address definitions and properties of the factorial, binomial coef-
ficient, and gamma function (Andrews 1999; Wallis and George 2010).
n! = [n]n (1.4.57)
0! = 1. (1.4.58)
Example 1.4.20 If we use each of the five numbers {1, 2, 3, 4, 5} once, we can
generate 5! = 5 × 4 × 3 × 2 × 1 = 120 five-digit numbers. ♦
Definition 1.4.9 (permutation) The number of ordered arrangements with k differ-
ent items from n different items is
n Pk = [n]k (1.4.59)
n!
n Ck = (1.4.60)
(n − k)!k!
n
for k = 0, 1, . . . , n, and n Ck , written also as k
, is called the (n, k) combination.
The symbol n Ck shown in (1.4.60) is also called the binomial coefficient, and
satisfies
n Ck = n Cn−k . (1.4.61)
n Pk
n Ck = . (1.4.62)
k!
1.4 Step, Impulse, and Gamma Functions 47
Example 1.4.23 We can choose two different numbers from the set {0, 1, . . . , 9} in
10 C2 = 45 different ways. ♦
Proof We have
n n − n1 n − n 1 − n 2 − · · · − n m−1
···
n1 n2 nm
n! (n − n 1 )! (n − n 1 − n 2 − · · · − n m−1 )!
= ···
n 1 ! (n − n 1 )! n 2 ! (n − n 1 − n 2 )! nm !
n!
= (1.4.64)
n 1 !n 2 ! · · · n m !
because the number of desired results is the number that ω1 occurs n 1 times among n
occurrences, ω2 occurs n 2 times among the remaining n − n 1 occurrences, · · · , and
ωm occurs n m times among the remaining n − n 1 − n 2 · · · − n m−1 occurrences. ♠
Example 1.4.24 Let A = {1, 2, 3}, B = {4, 5}, and C = {6} in rolling a die. When
the rolling is repeated 10 times, the number
10 of results in which A, B, and C occur
four, five, and one times, respectively, is 4,5,1 = 1260. ♦
For α > 0,
∞
Γ (α) = x α−1 e−x d x (1.4.65)
0
and
48 1 Preliminaries
when n is a natural number. In other words, the gamma function can be viewed as a
generalization of the factorial.
Let us now consider a further generalization. When α < 0 and α = −1, −2, . . .,
we can define the gamma function as
Γ (α + k + 1)
Γ (α) = , (1.4.68)
α(α + 1)(α + 2) · · · (α + k)
where k is the smallest integer such that α + k + 1 > 0. Next, for a complex number
z, let14
1, n = 0,
(z)n = (1.4.69)
z(z + 1) · · · (z + n − 1), n = 1, 2, . . . .
(α + n)(α + n − 1) · · · (α + 1)α(α − 1) · · · 1
α! =
(α + 1)(α + 2) · · · (α + n)
(α + n)!
= (1.4.70)
(α + 1)n
(n+1)α n!
from (1.4.67)–(1.4.69). Rewriting (1.4.70) as α! = (α+1)n
and subsequently as
n α n! (n + 1)α
α! = , (1.4.71)
(α + 1)n nα
we have
n α n!
α! = lim (1.4.72)
n→∞ (α + 1)n
because lim (n+1)
α
α
= 1 + n1 1 + n2 · · · 1 + αn = 1. Based on (1.4.72), the
n→∞ n
gamma function for a complex number α such that α = 0, −1, −2, . . . can be defined
as
n α−1 n!
Γ (α) = lim , (1.4.73)
n→∞ (α)n
14 Here, (z)n is called the rising factorial, ascending factorial, rising sequential product, upper
factorial, Pochhammer’s symbol, Pochhammer function, or Pochhammer polynomial, and is the
same as Appell’s symbol (z, n).
1.4 Step, Impulse, and Gamma Functions 49
n α n!
Γ (α) = lim (1.4.74)
n→∞ (α)n+1
Γ (α + 1) = αΓ (α), (1.4.75)
lim pΓ ( p) = 1. (1.4.76)
p→0
Γ (cn + a)
lim (cn)b−a = 1. (1.4.77)
n→∞ Γ (cn + b)
We can similarly show that (1.4.77) also holds true for a < b.
The gamma function Γ (α) is analytic at all points except at α = 0, −1, . . .. In
α−1
addition, noting that lim (α + k)Γ (α) = lim lim (α+k)n (α)n
n!
or
α→−k α→−k n→∞
n −k−1 n!
lim (α + k)Γ (α) = lim
α→−k n→∞ (−k)(−k + 1) . . . (−k + n − 1)
(−1)k n!
= lim
k! n→∞ n k+1 (n − k − 1)!
(−1)k (n − k)(n − k + 1) · · · n
= lim
k! n→∞ n k+1
(−1) k
k k−1 0
= lim 1 − 1− ··· 1 −
k! n→∞ n n n
(−1)k
= (1.4.78)
k!
for 0 < x < 1, which is called the Euler reflection formula. Because we have Γ ( −
π
= (−1) π
n
n)Γ (n + 1 − ) = sin π{(n+1)−} sin π
when x = n + 1 − and Γ (−)Γ (1 +
π π
) = sin(π+π) = − sin π when x = 1 + from (1.4.79), we have
Γ (−)Γ (1 + )
Γ ( − n) = (−1)n−1 . (1.4.80)
Γ (n + 1 − )
Replacing x with 1
2
+ x and 3
4
+ x in (1.4.79), we have
1 1 π
Γ −x Γ +x = (1.4.81)
2 2 cos πx
and
√
1 3 2π
Γ −x Γ +x = , (1.4.82)
4 4 cos πx − sin πx
respectively.
with x = 1
2
in (1.4.79). ♦
1
Example 1.4.26 By recollecting (1.4.75) and (1.4.83), we can obtain Γ +k =
1 √ 2
2
+ k − 1 21 + k − 2 · · · 21 Γ 21 = 1×3×···×(2k−1)
2k
π, i.e.,
1 Γ (2k + 1)
Γ +k =
2 22k Γ (k + 1)
(2k)! √
= 2k π (1.4.84)
2 k!
for k ∈ {0, 1, . . .}. Similarly, we get
1 22k k! √
Γ −k = (−1)k π (1.4.85)
2 (2k)!
1.4 Step, Impulse, and Gamma Functions 51
for k ∈ {0, 1, . . .} using Γ 21 = 21 − 1 21 − 2 · · · 21 − k Γ 21 − k =
1
(−1)k 1×3×···×(2k−1)
2k
Γ 21 − k = (−1)k 2(2k)!
2k k! Γ 2
−k . ♦
α+1
1
β
Γ β
. Subsequently, we have
∞
1 α+1
t α exp −t β dt = Γ (1.4.88)
0 |β| β
∞ 0 α
t α exp −t β dt = ∞ v β e−v β1 v β −1 dv = − β1 Γ α+1
1
because 0 β
by letting v =
t β when β < 0.
When α ∈ {−1, −2, . . .}, we have (Artin 1964)
Γ (α + 1) → ±∞. (1.4.89)
More specifically, the value Γ α + 1± = α± ! can be expressed as
Γ α + 1± → ±(−1)α+1 ∞. (1.4.90)
√ √
15Here, π ≈ 1.7725, √
2π
≈ 3.6276, and 2π ≈ 4.4429. In addition, we have Γ 18 ≈ 7.5339,
1 1 3 1 1 1
Γ 7 ≈ 6.5481, Γ 6 ≈ 5.5663, Γ 5 ≈ 4.5908, Γ 4 ≈ 3.6256, Γ 3 ≈ 2.6789, Γ 23 ≈
1.3541, Γ 43 ≈ 1.2254, Γ 45 ≈ 0.4022, Γ 56 ≈ 0.2822, Γ 67 ≈ 0.2082, and Γ 78 ≈
0.1596.
52 1 Preliminaries
Γ (α + 1) Γ (−β)
= (−1)α−β . (1.4.91)
Γ (β + 1) Γ (−α)
Γ (y+1) Γ (y+1)
Based on (1.4.91), it is possible to obtain lim = lim =
x↓β, y↓α Γ (x+1) x↑β, y↑α Γ (x+1)
Γ (y+1) Γ (y+1)
(−1)α−β ΓΓ (−α)
(−β)
and lim = lim = (−1) α−β+1 Γ (−β)
: in essence,
x↓β, y↑α Γ (x+1) x↑β, y↓α Γ (x+1) Γ (−α)
we have16
⎧
⎨ Γ (α± ) = (−1)α−β Γ (−β+1) ,
Γ (β ± ) Γ (−α+1)
(1.4.92)
⎩ Γ (α± ) = (−1)α−β+1 Γ (−β+1) .
∓
Γ (β ) Γ (−α+1)
This expression is quite useful when we deal with permutations and combinations
of negative integers, as we shall see later in (1.A.3) and Table 1.4.
π
Γ (z)Γ (−z) = − (1.4.93)
z sin πz
2π
|Γ ( j)|2 = , (1.4.94)
eπ − e−π
where 2π
eπ −e−π
≈ 0.2720 and 0.15502 + 0.49802 ≈ 0.2720. ♦
Example 1.4.29 If we consider only the region α > 0, then the gamma function
(Zhang and Jian 1996) exhibits the minimum value Γ (α0 ) ≈ 0.8856 at α = α0 ≈
1.4616, and is convex downward because Γ (α) > 0. ♦
for complex numbers α and β such that Re(α) > 0 and Re(β) > 0. In this section,
let us show
Γ (α)Γ (β)
B̃(α, β) = . (1.4.96)
Γ (α + β)
1
We have x α−1 (1 − x)(1 − x)β−1 d x = B̃(α, β) − B̃(α +
B̃(α, β + 1) =
1 0
1
1, β) and B̃(α, β + 1) = α1 x α (1 − x)β x=0 + αβ 0 x α (1 − x)β−1 d x = αβ B̃(α +
1, β). Thus,
α+β
B̃(α, β) = B̃(α, β + 1), (1.4.97)
β
β 1
which can also be obtained as B̃(α, β + 1) = 01 x α+β−1 1−x 1 x α+β 1−x β
d x = α+β +
x x
x=0
β 1 α+β 1−x β−1 d x β 1 α−1 β
α+β 0 x x = α+β 0 x (1 − x)β−1 d x = α+β B̃(α, β). Using (1.4.97) repeatedly,
x2
we get
17 The right-hand side of (1.4.95) is called the Eulerian integral of the first kind.
54 1 Preliminaries
(α + β)(α + β + 1)
B̃(α, β) = B̃(α, β + 2)
β(β + 1)
(α + β)(α + β + 1)(α + β + 2)
= B̃(α, β + 3)
β(β + 1)(β + 2)
..
.
(α + β)n
= B̃(α, β + n), (1.4.98)
(β)n
1
Next, from (1.4.95) with β = 1, we get B̃(α, 1) = 0 t α−1 dt = α1 . Using this
result into (1.4.100) with β = 1, we get
∞
1 Γ (1)
= t α−1 e−t dt. (1.4.101)
α Γ (α + 1) 0
From (1.4.100) and (1.4.102), we get (1.4.96). Note that (1.4.100)–(1.4.102) implic-
itly dictates that the defining equation (1.4.65) of the gamma function Γ (α) for α > 0
is a special case of (1.4.73).
18The left-hand side of (1.4.102) is called the Eulerian integral of the second kind. In Exercise 1.38,
we consider another way to show (1.4.96).
1.5 Limits of Sequences of Sets 55
In this section, the properties of infinite sequences of sets are discussed. The expo-
sition in this section will be the basis, for instance, in discussing the σ-algebra in
Sect. 2.1.2 and the continuity of probability in Appendix 2.1.
Let us first consider the limits of sequences of numbers before addressing the limits
of sequences of sets. When ai ≤ u and ai ≥ v for every choice of a number ai from
the set A of real numbers, the numbers u and v are called an upper bound and a lower
bound, respectively, of A.
Definition 1.5.1 (least upper bound; greatest lower bound) For a subset A of real
numbers, the smallest among the upper bounds of A is called the least upper bound
of A and is denoted by sup A, and the largest among the lower bounds of A is called
the greatest lower bound and is denoted by inf A.
and
respectively. When there exists no upper bound and no lower bound of A, it is denoted
by sup A → ∞ and inf A → −∞, respectively.
Example 1.5.1 For the set A = {1, 2, 3}, the least upper bound is sup A = 3 and
the greatest lower bound is inf A = 1. For the sequence {an } = {0, 1, 0, 1, . . .}, the
least upper bound is sup an = 1 and the greatest lower bound is inf an = 0. ♦
lim sup xn = xn
n→∞
= inf sup xk (1.5.3)
n≥1 k≥n
and
56 1 Preliminaries
lim inf xn = xn
n→∞
= sup inf xk (1.5.4)
n≥1 k≥n
and
1 √ √
zn = 4n + 3 + (−1)n 4n − 5 , (1.5.7)
2n
√
19 The value xn = 2 can be obtained by solving the equation xn = 2 + xn .
1.5 Limits of Sequences of Sets 57
sequences. ♦
∞
lim Bn = ∩ Bi (1.5.9)
n→∞ i=1
The limit lim Bn denotes the set of points contained in at least one of {Bn }∞
n=1
n→∞
and in every set of {Bn }∞ ∞
n=1 when {Bn }n=1 is a non-decreasing and a non-increasing
sequence, respectively.
∞
because 1, 2 − n1 n=1
is a non-decreasing sequence. Likewise, the limit of the
∞
non-decreasing sequence {(−n, a)}∞
n=1 is lim Bn = ∪ (−n, a) = (−∞, a). ♦
n→∞ n=1
∞
Example 1.5.7 The sequence a, a + n1 n=1
is a non-increasing sequence and has
∞
the limit lim Bn = ∩ a, a + n1 or
n→∞ n=1
∞
which is a singleton set {a}. The non-increasing sequence 1 − n1 , 1 + n1 n=1 has
∞
the limit lim Bn = ∩ 1 − n1 , 1 + n1 = (0, 2) ∩ 21 , 23 ∩ · · · = [1, 1], also a sin-
n→∞ n=1
gleton set. Note that
!
1
lim 0, = {0} (1.5.12)
n→∞ n
is different from
!
1
0, lim =∅ (1.5.13)
n→∞ n
in (1.5.11). ♦
∞
Example 1.5.8 Consider the set S = {x : 0 ≤ x ≤ 1}, sequence {Ai }i=1 with
∞
Ai = x : i+1 < x ≤ 1 , and sequence {Bi }i=1 with Bi = x : 0 < x < 1i .
1
∞
Then, because {Ai }i=1 is a non-decreasing sequence, we have lim An = x : 21
n→∞
< x ≤ 1} ∪ x : 1
3
< x ≤ 1 ∪ · · · = {x : 0 < x ≤ 1} and S = {0} ∪ lim An .
n→∞
∞
Similarly, because {Bi }i=1 is a non-increasing sequence, we have lim Bn =
n→∞
{x : 0 < x < 1} ∩ x : 0 < x < 21 ∩ · · · = {x : 0 < x ≤ 0} = ∅. ♦
∞ ∞
Example 1.5.9 The sequences 1 + n1 , 2 n=1 and 1 + n1 , 2 n=1 of interval sets
are both non-decreasing
sequences
∞ with thelimits (1, 2) and (1, 2], respectively. The
∞
sequences a, a + n1 n=1 and a, a + n1 n=1 are both non-increasing sequences
with the limits (a, a] = ∅ and [a, a] = {a}, respectively. ♦
∞ ∞
Example 1.5.10 The sequences 1 − n1 , 2 n=1 and 1 − n1 , 2 n=1 are both non-
increasing sequences with the limits [1, 2) and [1, 2], respectively. The sequences
1.5 Limits of Sequences of Sets 59
∞ ∞
1, 2 − n1 n=1 and 1, 2 − n1 n=1 are both non-decreasing sequences with the
limits (1, 2) and [1, 2), respectively. ♦
∞ ∞
Example 1.5.11 The sequences 1 + n1 , 3 − n1 n=1 and 1 + n1 , 3 − n1 n=1 are
both non-decreasing sequences
with the common limit (1, 3). Similarly, the non-
∞ ∞
decreasing sequences 1 + n1 , 3 − n1 n=1 and 1 + n1 , 3 − n1 n=1 both have the
limit20 (1, 3). ♦
∞ ∞
Example 1.5.12 The four sequences 1 − n1 , 3 + n1 n=1 , 1 − n1 , 3 + n1 n=1 ,
∞ ∞
1 − n1 , 3 + n1 n=1 , and 1 − n1 , 3 + n1 n=1 are all non-increasing sequences with
the common limit21 [1, 3]. ♦
We have discussed the limits of monotonic sequences in Sect. 1.5.2. Let us now
consider the limits of general sequences. First, note that any element in a set of an
infinite sequence belongs to
(1) every set,
(2) every set except for a finite number of sets,
(3) infinitely many sets except for other infinitely many sets, or
(4) a finite number of sets.
Keeping these four cases in mind, let us define the lower bound and upper bound
sets of general sequences.
Definition 1.5.6 (lower bound set) For a sequence of sets, the set of elements belong-
ing to at least almost every set of the sequence is called the lower bound or lower
bound set of the sequence, and is denoted by22 lim inf or by lim .
n→∞ n→∞
Let us express the lower bound set lim inf Bn = Bn of the sequence {Bn }∞
n=1 in
n→∞
terms of set operations. First, note that
∞
G i = ∩ Bk (1.5.14)
k=i
20 In short, irrespective of the type of the parentheses, the limit is in the form of ‘(a, . . .’ for an
interval when the beginning point is of the form a + n1 , and the limit is in the form of ‘. . . , b)’ for
an interval when the end point is of the form b − n1 .
21 In short, irrespective of the type of the parentheses, the limit is in the form of ‘[a, . . .’ for an
interval when the beginning point is of the form a − n1 , and the limit is in the form of ‘. . . , b]’ for
an interval when the end point is of the form b + n1 .
22 The acronym inf stands for infimum or inferior.
60 1 Preliminaries
{Bn }∞
n=1 = {0, 1, 3}, {0, 2}, {0, 1, 2}, {0, 1}, {0, 1, 2}, {0, 1}, . . . (1.5.16)
∞ ∞
lim sup Bn = ∩ ∪ Bk , (1.5.17)
n→∞ i=1 k=i
which, together with (1.5.19), confirms the intuitive observation that the upper bound
is not smaller than the lower bound.
In Definition 1.5.5, we addressed the limit of monotonic sequences. Let us now
extend the discussion to the limits of general sequences of sets.
Definition 1.5.8 (convergence of sequence; limit set) If
=B (1.5.22)
from (1.1.2) using (1.5.19) and (1.5.21). In such a case, the sequence {Bn }∞
n=1 is
called to converge to B, which is denoted by Bn → B or lim Bn = B. The set B is
n→∞
called the limit set or limit of {Bn }∞
n=1 .
∞
lim inf Bn = ∪ Bi
n→∞ i=1
= lim Bn . (1.5.23)
n→∞
∞ n n
We also have ∪ Bk = lim ∪ Bk = lim Bn from ∪ Bk = Bn for any value of i.
k=i n→∞ k=i n→∞ k=i
∞ ∞ ∞
Thus, we have lim sup Bn = ∩ ∪ Bk = ∩ lim Bn , consequently resulting in
n→∞ i=1 k=i i=1 n→∞
∞ ∞
Next, assume {Bn }∞
n=1 is a non-increasing sequence. Then, lim inf Bn = ∪ ∩ Bk =
n→∞ i=1 k=i
∞ ∞ n
∪ lim Bn = lim Bn because ∩ Bk = lim Bn from ∩ Bk = Bn for any i. We also
i=1 n→∞ n→∞ k=i n→∞ k=i
∞ ∞ ∞ ∞ n
have lim sup Bn = ∩ ∪ Bk = ∩ Bi = lim Bn because ∪ Bk = lim ∪ Bk = Bi
n→∞ i=1 k=i i=1 n→∞ k=i n→∞ k=i
n
from ∪ Bk = Bi for any i.
k=i
∞
Example 1.5.15 Obtain the limit of {Bn }∞
n=1 = 1 − n1 , 3 − n1 n=1 .
Solution First, because B1 = (0, 2), B2 = 21 , 25 , B3 = 23 , 83 , B4 = 43 , 11
4
, . . .,
∞ ∞ 5 ∞
we have G 1 = ∩ Bk = [1, 2), G 2 = ∩ Bk = 1, 2 , · · · and H1 = ∪ Bk = (0, 3),
k=1 k=2 k=1
∞ 1 ∞
H2 = ∪ Bk = 2 , 3 , · · · . Therefore, the lower bound is lim inf Bn = ∪ G i =
k=2 n→∞ i=1
∞
[1, 3), the upper bound is lim sup Bn = ∩ Hi = [1, 3), and the limit is lim Bn =
n→∞ i=1 n→∞
1.5 Limits of Sequences of Sets 63
[1,
3). 1We can1 similarly
show all1 [1, 3)
that the limits are 24
for the sequences
∞ 1 ∞ ∞
1 − n , 3 − n n=1 , 1 − n , 3 − n n=1 , and 1 − n , 3 − n1 n=1 .
1
♦
∞
Example 1.5.16 Obtain the limit of the sequence 1 + n1 , 3 + n1 n=1 of intervals.
Solution First, because B1 = (2, 4), B2 = 23 , 27 , B3 = 43 , 10
3
, B4 = 54 , 13
4
, . . .,
∞ ∞ 3 ∞
we have G 1 = ∩ Bk = (2, 3], G 2 = ∩ Bk = 2 , 3 , · · · and H1 = ∪ Bk = (1, 4),
k=1 k=2 k=1
∞ 7 ∞
H2 = ∪ Bk = 1, 2 , · · · . Thus, the lower bound is lim inf Bn = ∪ G i = (1, 3], the
k=2 n→∞ i=1
∞
upper bound is lim sup Bn = ∩ Hi = (1, 3], and the limit set is lim Bn = (1, 3].
i=1 n→∞
n→∞ ∞
We can similarly show that the limits of the sequences 1 + n1 , 3 + n1 n=1 ,
∞ ∞
1 + n1 , 3 + n1 n=1 , and 1 + n1 , 3 + n1 n=1 are all (1, 3]. ♦
Appendices
Recollecting that
α! = Γ (α + 1) (1.A.1)
for a complex number p, where J− = {−1, −2, . . .} denotes the set of negative
integers. Therefore, 0! = Γ (1) = 1 for p = 0.
24 As it is mentioned in Examples 1.5.11 and 1.5.12, when the lower end value of an interval is
in the form of a − n1 and a + n1 , the limit is in the form of ‘[a, . . .’ and ‘(a, . . .’, respectively. In
addition, when the upper end value of an interval is in the form of b + n1 and b − n1 , the limit is in
the form of ‘. . . , b]’ and ‘. . . , b)’, respectively, for both open and closed ends.
64 1 Preliminaries
Example 1.A.1 From (1.A.2), it is easy to see that (−2)! = ±∞ and that − 21 ! =
1 √
Γ 2 = π from (1.4.83). ♦
⎧ Γ ( p+1)
⎪
⎪ , p ∈
/ J− and p−q ∈
/ J− ,
⎨ Γ ( p−q+1)
(−1)q Γ Γ(−(−p+q) , p ∈ J− and p−q ∈ J− ,
p Pq = p) (1.A.3)
⎪
⎪ ∈
/ J− p−q ∈ J− ,
⎩ 0, p and
±∞, p ∈ J− and p−q ∈
/ J−
0, z = 2, 3, . . . ,
1 Pz = (1.A.5)
1
Γ (2−z)
, otherwise,
±∞, z ∈ J− ,
z Pz = (1.A.6)
/ J− ,
Γ (z + 1), z ∈
and
(−1)z z Pz , z = 0, 1 . . . ,
−1 Pz =
±∞, otherwise
(−1)z Γ (z + 1), z = 0, 1 . . . ,
= (1.A.7)
±∞, otherwise
from (1.A.3). ♦
Using (1.A.3), we can also get −2 P−0.3 = ±∞, −0.1 P1.9 = 0, 0 P3 = ΓΓ(−2) (1)
= 0,
√
Γ(2)
3
Γ(2)
3
π Γ (4) Γ (4)
1 P3 = = 8 , 21 P0.8 = Γ (0.7) = 2Γ (0.7) , 3 P 21 = Γ 7 = 5√π , 3 P− 21 = Γ 9 =
3 16
2 Γ (− 23 ) (2) (2)
Γ (4) Γ ( 21 ) Γ (−1) Γ (5)
32
√ , P
35 π 3 −2
= Γ (6) = 20 , − 21 P3 = Γ − 5 = − 15 , and −2 P3 = Γ (−4) = (−1) Γ (2)
1 8
=
( 2)
−24. Table 1.3 shows some values of the permutation p Pq .
Appendices 65
Example 1.A.4 When both p and p − q are negative integers and q is a non-negative
integer, the binomial coefficient p Cq = (−1)q − p+q−1 Cq can be expressed also as
Γ ( p+1)
Table 1.4 The binomial coefficient p Cq = Γ ( p−q+1)Γ (q+1) in the complex space
Is p ∈ J− ? Is q ∈ J− ? Is p − q ∈ J− ? p Cq
Γ ( p+1)
No No No Γ ( p−q+1)Γ (q+1)
Γ (−q)
Yes Yes No (−1) p−q Γ ( p−q+1)Γ (− p)
= (−1) p−q −q−1 C p−q
Γ (− p+q)
Yes No Yes (−1)q Γ (− p)Γ (q+1)
= (−1) − p+q−1 Cq
q
by recollecting ΓΓ (α+1)
(β+1)
= (−1)α−β ΓΓ (−α)
(−β)
shown in (1.4.91) for α − β an integer,
α < 0, and β < 0. The two formulas (1.A.8) and (1.A.9) are the same as −r Cx =
(−1)x r +x−1 Cx , which we will see in (2.5.15) for a negative real number −r and a
non-negative integer x. ♦
expressed as
1
0 Cz =
Γ (1 − z)Γ (1 + z)
⎧
⎨ 1, z = 0,
= 0, z = ±1, ±2, . . . , (1.A.10)
⎩ 1
, otherwise
Γ (1−z)Γ (1+z)
and
⎧
⎨ 1, z = 0, 1,
1 Cz = 0, z = −1, ±2, ±3, . . . , (1.A.11)
⎩ 1
, otherwise,
Γ (2−z)Γ (1+z)
respectively. ♦
(−3)!
We can similarly obtain25 −3 C−2 = −3 C−1 =(−2)!(−1)!
= 0, −3 C2 = −3 C−5 =
(−3)! (−7)!
(−5)!2!
= (−1)2 2!2!
4!
= 4 C2 , −7 C3 = −7 C−10 = (−10)!3!
= (−1) 6!3!
9!
= −9 C3 , and
Γ ( 27 )
5 C2 = 5 C 1 = = 15
2 Γ (3)Γ ( 23 ) 8
. Table 1.5 shows some values of the binomial
2 2
coefficient.
Example 1.A.6 Obtain the series expansion of h(z) = (1 + z) p for p a real number.
∞
(1 + z) p = p Ck z
k
. (1.A.12)
k=0
p
(1 + z) p = p Ck z
k
(1.A.13)
k=0
∞
(1 + z) p = p Ck z
p−k
(1.A.14)
k=0
for p < 0 and |z| > 1. Combining (1.A.12) and (1.A.14), we eventually get
⎧ ∞
⎪
⎪
⎨ p Ck z ,
k
for p ≥ 0 or for p < 0, |z| < 1,
(1 + z) p
= k=0
∞ (1.A.15)
⎪
⎪ p−k
, for p < 0, |z| > 1,
⎩ p Ck z
k=0
∞
k
Alternatively, from 1
(1+z)2
= 1
1−(−2z−z 2 )
= −2z − z 2 for −2z − z 2 < 1, we
k=0
have
1
= 1 + −2z − z 2 + 4z 2 + 4z 3 + z 4
(1 + z)2
+ −8z 3 + · · · + · · · , (1.A.18)
by changing the order in the addition. The result26 (1.A.19) is the same as (1.A.17)
for |z| < 1.
Theorem 1.A.1 For γ ∈ {0, 1, . . .} and any two numbers α and β, we have
√
26 In writing (1.A.19) from (1.A.18), we assume −2z − z 2 < 1, 0 < |Re(z) + 1| ≤ 2 , a
√
proper subset of the region |z| < 1. Here, −2z − z 2 < 1, 0 < |Re(z) + 1| ≤ 2 is the right
half of the dumbbell-shaped region −2z − z 2 < 1, which is a proper subset of the rectangle
√
|Im(z)| ≤ 21 , |Re(z) + 1| ≤ 2 .
Appendices 69
γ
α β α+β
= , (1.A.20)
m=0
γ−m m γ
γ
α − γc α − mc β β + mc α + β − γc α + β
= (1.A.21)
m=0
α − mc γ − m β + mc m α+β γ
Γ (c)Γ (c − a − b)
2 F1 (a, b; c; 1) = , Re(c) > Re(a + b) (1.A.22)
Γ (c − a)Γ (c − b)
∞
(a)n (b)n z n
2 F1 (a, b; c; z) = , Re(c) > Re(b) > 0 (1.A.23)
n=0
(c)n n!
27 The function 2 F1 is also called Gauss’ hypergeometric function, and a special case of the gener-
alized hypergeometric function
∞
(α1 )k (α2 )k · · · (α p )k z k
p Fq (α1 , α2 , . . . , α p ; β1 , β2 , . . . , βq ; z) = .
(β1 )k (β2 )k · · · (βq )k k!
k=0
π
Also, note that 2 F1 1, 1; 23 ; 21 = 2.
70 1 Preliminaries
γ
α β α β α+β α
n
α!
= γ−m m
+ γ−m m
= γ
noting that γ−m
= (α−γ+m)!
m=0 m=γ+1
1
(γ−m)!
= 0 for m = γ + 1, γ + 2, . . . due to (γ − m)! = ±∞. Similarly, when γ =
n
α β
γ
α β α β α+β
γ
n + 1, n + 2, . . ., we have γ−m m
= γ−m m
− γ−m m
= γ
m=0 m=0 m=n+1
because mβ = mn = (n−m)!m! n!
= 0 from (n − m)! = ±∞ for m = n + 1, n + 2, . . ..
In short, we have
n
α n α+n
= (1.A.25)
m=0
γ−m m γ
γ
γ−1
α β α β−1
[m]ζ0 +1 =β [m]ζ0
m=0
γ−m m γ−1−m m
m=ζ0
α + β − 1 − ζ0
= β[β − 1]ζ0
γ − 1 − ζ0
α + β − (ζ0 + 1)
= [β]ζ0 +1 . (1.A.28)
γ − (ζ0 + 1)
The result (1.A.28) implies that (1.A.26) holds true also when ζ = ζ0 + 1 if (1.A.26)
holds true when ζ = ζ0 . In short, (1.A.26) holds true for ζ ∈ {0, 1, . . .}.
(Method 2) Noting (Charalambides 2002; Gould 1972) that [m]ζ mβ =
[β] [β−ζ]m−ζ β−ζ γ
α
β−ζ α+β−ζ
[m]ζ [m]ζ ζ (m−ζ)! = [β]ζ m−ζ , we can rewrite (1.A.26) as γ−m m−ζ
= γ−ζ ,
m=ζ
γ−ζ
α
β−ζ α+β−ζ
which is the same as the Chu-Vandermonde convolution γ−ζ−k k
= γ−ζ
.
k=0
♠
−1α = −4,
Example 1.A.12 Assume −4β = −1, γ = 3, and ζ = 2 in (1.A.26). then, the
−1
left-hand side is 2 −4
1 2
+ 6 0 3
(−4)! (−1)!
= 2 × (−5)!1! (−3)!2!
(−4)! (−1)!
+ 6 × (−4)!0! (−4)!3!
=
−7 (−7)!
−14 and the right-hand side is (−1)(−2) 1 = 2 × (−8)!1! = −14. ♦
Example 1.A.13 The identity (1.A.26) holds true also for non-integer values of α
or β. For example, when α = 21 , β = − 21 , γ = 2, and ζ = 1, the left-hand side is
1 − 1 1 − 1
= −( 21 )!1! (− 32 )!1! + 2 (12 )!0! (− 52 )!2! = 21 and the right-hand side
2 2 + 2 2 2
1
! −1 ! 1
! −1 !
1 1
0 2 ( 2 ) ( 2 ) ( 2 ) ( 2)
is − 21 × −1 1
= − 1
2
× Γ (0)
Γ (−1)
= − 1
2
× (−1)!
(−2)!1!
= 1
2
. ♦
√
Example 1.A.14 Denoting the unit imaginary number by j = −1, assume α = e
− j, β = π + 2 j, γ = 4, and ζ = 2 in (1.A.26). Then, the left-hand
side is 0 + 0 + 2 × α2 β2 + 6 × α1 β3 + 12 × α0 β4 = 21 α(α − 1)β(β − 1) + αβ(β − 1)(β −
2) + 21 β(β − 1)(β − 2)(β − 3) = 21 β(β − 1) {α(α − 1) + 2α(β − 2) + (β − 2)(β − 3)} = 21 β(β −
1) α2 + α(2β − 5) + (β − 2)(β − 3) = 21 β(β − 1)(α + β − 2)(α + β − 3)
and the right-hand
side is also β(β − 1) α+β−2
2
= β(β − 1) (α+β−2)!
(α+β−4)!2!
= 1
2
β(β − 1)(α + β − 2)(α +
β − 3). ♦
72 1 Preliminaries
γ
α β α+β−1
m =β
m=0
γ−m m γ−1
βγ α + β
= (1.A.29)
α+β γ
and
γ
α β α+β−2
m(m − 1) = β(β − 1)
m=0
γ−m m γ−2
β(β − 1)γ(γ − 1) α + β
= , (1.A.30)
(α + β)(α + β − 1) γ
respectively. The two results (1.A.29) and (1.A.30) will later be useful for obtaining
the mean and variance of the hypergeometric distribution in Exercise 3.68. ♦
Using (1.A.32) and Γ (α)Γ (β) = Γ (α + β) B̃(α, β) from (1.4.96) will lead us to
∞
s x−1
Γ (1 − x)Γ (x) = ds (1.A.33)
0 s+1
in the complex space. The contour C of the integral in (1.A.34) is shown in Fig. 1.26,
a counterclockwise path along the outer circle. As there exists only one pole z = 1
Appendices 73
after some steps. When x > 0 and p → 0, the third term in the right-hand side
−π+ j p x e jθx
of (1.A.37) is lim π− pe jθ −1 dθ = 0. Similarly, the first term in the right-hand
p→0
x jθx
jR e √ Rx Rx
side of (1.A.37) is Re jθ −1 = ≤ R−1 → 0 when x < 1 and R → ∞.
R 2 −2R cos θ+1
Therefore, (1.A.37) can be written as
0 x−1 jπx ∞ x−1 − jπx
z x−1 r e r e
dz = 0 + dr + 0 + dr
C z−1 ∞ −r − 1 0 −r − 1
∞ r x−1
= e jπx − e− jπx dr
0 r +1
∞ x−1
r
= 2 j sin πx dr (1.A.38)
0 r +1
z2
z3
Re(z)
×
p 1 R
z1 z4
∞
r x−1 π
dr = (1.A.39)
0 r +1 sin πx
for 0 < x < 1 from (1.A.35) and (1.A.38) and, subsequently, we have (1.A.31) for
0 < x < 1 from (1.A.33) and (1.A.39).
Consider crossing a creek via n − 1 stepping stones with steps 0 and n denoting the
two banks of the creek. Assume we can move only in one direction and skip either
0 or 1 step at each move.
(1) Obtain the number of ways we can complete the crossing in k moves.
(2) Obtain the number of ways we can complete the crossing.
Solution (1) Let a(n, k) be the number of ways we can complete the crossing in k
moves and let
%n &
n2 = , (1.A.40)
2
where #x$ denotes the smallest integer not smaller than x and is called the ceiling
function. Denote the number of moves in which we skip 0 and 1 step by k1 and
k2 , respectively. Then, from k1 + k2 = k and k1 + 2k2 = n, we get k1 = 2k − n
and k2 = n − k. The number a(n, k) of ways we can complete the crossing in
k moves is the same as the number k1k!!k2 ! of arranging k1 of 1’s and k2 of 2’s. In
short, we have a(n, k) = (2k−n)!(n−k)!
k!
, i.e.,
Table 1.6 Number a(n, k) = k Cn−k of ways we can cross a creek via n − 1 stepping stones in k
moves when we can move only in one direction and skip either 0 or 1 step at each move
n
1 2 3 4 5 6 7 8 9 10 11
k 1 1 1 0 0 0 0 0 0 0 0 0
2 0 1 2 1 0 0 0 0 0 0 0
3 0 0 1 3 3 1 0 0 0 0 0
4 0 0 0 1 4 6 4 1 0 0 0
5 0 0 0 0 1 5 10 10 5 1 0
6 0 0 0 0 0 1 6 15 20 15 6
7 0 0 0 0 0 0 1 7 21 35 35
8 0 0 0 0 0 0 0 1 8 28 56
9 0 0 0 0 0 0 0 0 1 9 36
10 0 0 0 0 0 0 0 0 0 1 10
11 0 0 0 0 0 0 0 0 0 0 1
n
a(n) = k Cn−k . (1.A.42)
k=n 2
We also have a(1) = 1 and a(2) = 2. Therefore, a(n) is the sum of the number
of ways from step n − 1 to n and that from n − 2 directly to n. We thus have
a(n) = a(n − 1) + a(n − 2) for n = 3, 4, . . . because the number of ways to
step n − 1 is a(n − 1) and that to step n − 2 is a(n − 2). Solving the recursion,
we get 28
⎧' ⎫
√ (n+1 ' √ (n+1 ⎬
1 ⎨ 1+ 5 1− 5
a(n) = √ − . (1.A.43)
5⎩ 2 2 ⎭
⎧' ⎫
√ (n+1 ' √ (n+1 ⎬
1 ⎨ 1+ 5
n
1− 5
k Cn−k = √ − (1.A.44)
k=0 5⎩ 2 2 ⎭
from (1.A.42) and (1.A.43). The result (1.A.44) is the same as the well-
% n2 &
∞
known identity (Roberts and Tesman 2009) Fn+1 = n−k Ck , where {Fn }n=0
k=0
are Fibonacci numbers. Here, %x&, also expressed as [x], denotes the greatest
integer not larger than x and is called the floor function or Gauss function.
Clearly, %x& = #x$ = x when x is an integer while %x& = #x$ − 1 when x is not
an integer.
If the condition ‘skip either 0 or 1 step at each move’ is replaced with ‘skip either
0 step or 2 steps at each move’, then we have
n
m C n−m
2
= a13 r13
n
+ 2Re c13 z 13
n
(1.A.45)
m=n 3 ,n 3 +2,...
, -
for n = 0, 1, . . ., where n 3 = n − 2 n3 . In addition, the three numbers r13 =
∗
3 (1
1
+ 213 ), z 13 = 13 1 − 13 − j 3 213 − 1 , and z 13 with the unit imagi-
√
nary number j = −1 are the solutions to the difference equation a(n) =
a(n − 1) + a(n − 3) for n ≥ 4 with initial conditions a(1) = 1, a(2) = 1,
and a(3) = 2. The three numbers are also the solutions to θ3 − θ2 − 1 =
√ 1 √ 1
≈ 1.6984, a13 = r (r13 −z( )13 r −z
3 3 (z −1) z ∗ −1)+1
0, and 13 = 21 29−3 93
+ 29+32 93 =
13 ( 13 13 )
2 ∗
13 13
n
0 1∞
n
m C n−m
2
= {1, 1, 1, 2, 3, 4, 6, 9, . . .} , (1.A.48)
m=n 3 ,n 3 +2,... n=0
and
0 1∞
n
respectively.
When some operations such as limit, integration, and differentiation are evaluated,
a change of order does not usually make a difference. However, the order is of
importance in certain cases. Here, we present some examples in which a change of
order yields different results.
1
lim f n (x)d x = 1. (1.A.51)
n→∞ 0
On the other hand, lim f n (x) is 0 for 0 ≤ x ≤ 1 because f n (x) = 0 for x = 0 and
n→∞
1 1
lim f n (x) = x1 lim (nx)
2
x2
we have lim lim f (x, y) = lim lim f (x, y) because lim lim f (x, y) = lim 2 =1
x→0 y→0 y→0 x→0 x→0 y→0 x→0 x
lim −y
2
and lim lim f (x, y) = y 2 = −1. ♦
y→0 x→0 y→0
Example 1.A.19 (Gelbaum and Olmsted 1964) Consider a sequence{ f n (x)}∞ n=1 of
functions in which f n (x) = 1+nx2 x 2 for |x| ≤ 1. Then, ddx lim f n (x) = 0, but
n→∞
d 1 − n2 x 2
lim f n (x) = lim 2
n→∞ dx n→∞
1 + n2 x 2
1, x = 0
= (1.A.53)
0, 0 < |x| ≤ 1,
implying that d
dx
lim f n (x) = lim d
dx
f n (x) . ♦
n→∞ n→∞
∂
value of x. On the other hand, ∂x
f (x, y) = 0 for any value of y when x = 0 because
0 2
∂ 3x 2
− 2x 4
exp − xy , y > 0,
f (x, y) = y2 y3 (1.A.55)
∂x 0, y = 0.
In other words,
01 2
3x 2 2x 4
1
∂ − exp − xy x = 0,
f (x, y) dy = 1
0 y2 y3
0 ∂x 0 dy, x =0
0
2
(1 − 2x ) exp −x , x = 0,
2
= (1.A.56)
0, x = 0,
1 1∂
and thus d
dx 0 f (x, y)dy = 0 ∂x f (x, y)dy. ♦
Example 1.A.21 (Gelbaum and Olmsted 1964) Consider the Cantor function
φC (x) discussed in Example 1.3.11 and f (x) = 1 for x ∈ [0, 1]. Then, both
1
the Riemann-Stieltjes and Lebesgue-Stieltjes integrals produce 0 f (x)dφC (x) =
1
[ f (x)φC (x)]10 − 0 φC (x)d f (x) = φC (1) − φC (0) − 0, i.e.,
1
f (x) dφC (x) = 1 (1.A.57)
0
1 1
while the Lebesgue integral results in 0 f (x)φC (x)d x = 0 0d x = 0. ♦
Theorem 1.A.3 If the sum α + β and product αβ of two numbers α and β are both
integers, then
αn + β n = integer (1.A.58)
Proof Let us prove the theorem via mathematical induction. It is clear that (1.A.58)
holds true when n = 0 and 1. When n = 2, α2 + β 2 = (α + β)2 − 2αβ is an integer.
Assume αn + β n are all integers for n = 1, 2, . . . , k − 1. Then,
αk + β k = (α + β)k − k C1 αβ αk−2 + β k−2 − k C2 (αβ)2 αk−4 + β k−4
0 k−1
k C k−1 (αβ) 2 (α + β) , k is odd,
−··· − 2
k (1.A.59)
k C 2k (αβ) 2 , k is even,
80 1 Preliminaries
Dn = αa n − βbn (1.A.60)
and
Appendices 81
(3) {a n − bn }∞
n=1 is a sequence that first increases and then decreases with the max-
imum at n = #r $ if r is not an integer and at n = r and n = r + 1 if r is an
integer when a + b > 1 and 0 < a < 1.
♦
Example 1.A.24 Assume α = β = 1, a = 0.95, and b = 0.4. Then, we have
0.95 −%4 < 0.95 2
− 42 < 0.953 − 43 and 0.953 − 43 > 0.954 − 44 > · · · because
0.6 &
ln 0.05
#r $ = ln 0.95 ≈ #2.87$ = 3. ♦
0.4
The number of ways to select r different elements from a set of n distinct elements is
n Cr . Because every element will be selected as many times as any other element, each
of the n elements will be selected n Cr × nr = n−1 Cr −1 times over the n Cr selections.
Each of the n elements will be included at least once if we choose appropriately
%n &
m1 = (1.A.62)
r
selections among the n Cr selections.
. / For example, assume the set {1, 2, 3, 4, 5} and
r = 2. Then, we have m 1 = 25 = 3, and thus, each of the five elements is included
at least once in the three selections (1, 2), (3, 4), and (4, 5).
Next, it is possible that one or more elements will not be included if we consider
n−1 Cr selections or less among the total n Cr selections. For example, for the set
{1, 2, 3, 4, 5} and r = 2, in some choices of 4 C2 = 6 selections or less such as (1, 2),
(1, 3), (1, 4), (2, 3), (2, 4), and (3, 4) among the total of 5 C2 = 10 selections, the
element 5 is not included.
On the other hand, each of the n elements will be included at least once in any
m2 = 1 + n−1 Cr (1.A.63)
The identity (1.A.64) implies that the number of ways for a specific element not to
be included when selecting r elements from a set of n distinct elements is the same
as the following two numbers:
(1) The number of ways to select r elements from a set of n − 1 distinct elements.
(2) The difference between the number of ways to select r elements from a set of n
distinct elements and that for a specific element to be included when selecting r
elements from a set of n distinct elements.
82 1 Preliminaries
In addition, we have
b g2 (x)
f (x, y)d xd y = f (x, y)d yd x (1.A.66)
a g1 (x)
A
n
N (n) = M(n, k). (1.A.67)
k=1
1: {1},
2: {2}, {1,1},
(1.A.68)
3: {3}, {2,1}, {1,1,1},
4: {4}, {3,1}, {2,2}, {2,1,1}, {1,1,1,1},
Theorem 1.A.6 Denote the least common multiplier of k consecutive natural num-
bers 1, 2, . . . , k by k̃. Let the quotient and remainder of n when divided by k be Q k
and Rk , respectively. If we write
n = k̃ Q k̃ + Rk̃ , (1.A.70)
k−1
M(n, k) = ci,k Rk̃ Q ik̃ , Rk̃ = 0, 1, . . . , k̃ − 1 (1.A.71)
i=0
k−1
in terms of k̃ polynomials of order k − 1 in Q k̃ , where ci,k (·) i=0
are the coefficients
of the polynomial.
and
⎧ 3
⎪
⎪ n + 3n 2 , R12 = 0,
⎪ 3
⎪
⎪
⎪ n + 3n 2 − 20, R12 = 2,
⎪ 3
⎪
⎪
⎪ n + 3n 2 + 32, R12 = 4,
⎪
⎪
⎨ n 3 + 3n 2 − 36, R12 = 6,
144 M(n, 4) = n 3 + 3n 2 + 16, R12 = 8, (1.A.74)
⎪
⎪
⎪
⎪ n 3 + 3n 2 − 4, R12 = 10,
⎪
⎪
⎪
⎪ n 3 + 3n 2 − 9n + 5, R12 = 1, 7,
⎪
⎪
⎪
⎪ n 3 + 3n 2 − 9n − 27, R12 = 3, 9,
⎩ 3
n + 3n 2 − 9n − 11, R12 = 5, 11,
for example. Table 1.8 shows the 60 polynomials of order four in Q 60 for the repre-
sentation of M(n, 5).
84 1 Preliminaries
0
Table 1.8 Coefficients c j,5 (r ) j=4 in M(n, 5) = c4,5 (R60 )Q 460 + c3,5 (R60 )Q 360 +
c2,5 (R60 )Q 260 + c1,5 (R60 )Q 60 + c0,5 (R60 )
r c4,5 (r ), c3,5 (r ), c2,5 (r ), c1,5 (r ), c0,5 (r ) r c4,5 (r ), c3,5 (r ), c2,5 (r ), c1,5 (r ), c0,5 (r )
0 4500, 750, 25/2, −5/2, 0 1 4500, 1050, 115/2, 1/2, 0
2 4500, 1350, 235/2, 3/2, 0 3 4500, 1650, 385/2, 17/2, 0
4 4500, 1950, 565/2, 29/2, 0 5 4500, 2250, 775/2, 55/2, 1
6 4500, 2550, 1015/2, 81/2, 1 7 4500, 2850, 1285/2, 123/2, 2
8 4500, 3150, 1585/2, 167/2, 3 9 4500, 3450, 1915/2, 229/2, 5
10 4500, 3750, 2275/2, 295/2, 7 11 4500, 4050, 2665/2, 381/2, 10
12 4500, 4350, 3085/2, 473/2, 13 13 4500, 4650, 3535/2, 587/2, 18
14 4500, 4950, 4015/2, 709/2, 23 15 4500, 5250, 4525/2, 855/2, 30
16 4500, 5550, 5065/2, 1011/2, 37 17 4500, 5850, 5635/2, 1193/2, 47
18 4500, 6150, 6235/2, 1387/2, 57 19 4500, 6450, 6865/2, 1609/2, 70
20 4500, 6750, 7525/2, 1845/2, 84 21 4500, 7050, 8215/2, 2111/2, 101
22 4500, 7350, 8935/2, 2393/2, 119 23 4500, 7650, 9685/2, 2707/2, 141
24 4500, 7950, 10465/2, 3039/2, 164 25 4500, 8250, 11275/2, 3405/2, 192
26 4500, 8550, 12115/2, 3791/2, 221 27 4500, 8850, 12985/2, 4213/2, 255
28 4500, 9150, 13885/2, 4657/2, 291 29 4500, 9450, 14815/2, 5139/2, 333
30 4500, 9750, 15775/2, 5645/2, 377 31 4500, 10050, 16765/2, 6191/2, 427
32 4500, 10350, 17785/2, 6763/2, 480 33 4500, 10650, 18835/2, 7377/2, 540
34 4500, 10950, 19915/2, 8019/2, 603 35 4500, 11250, 21025/2, 8705/2, 674
36 4500, 11550, 22165/2, 9421/2, 748 37 4500, 11850, 23335/2, 10183/2, 831
38 4500, 12150, 24535/2, 10977/2, 918 39 4500, 12450, 25765/2, 11819/2, 1014
40 4500, 12750, 27025/2, 12695/2, 1115 41 4500, 13050, 28315/2, 13621/2, 1226
42 4500, 13350, 29635/2, 14583/2, 1342 43 4500, 13650, 30985/2, 15597/2, 1469
44 4500, 13950, 32365/2, 16649/2, 1602 45 4500, 14250, 33775/2, 17755/2, 1747
46 4500, 14550, 35215/2, 18901/2, 1898 47 4500, 14850, 36685/2, 20103/2, 2062
48 4500, 15150, 38185/2, 21347/2, 2233 49 4500, 15450, 39715/2, 22649/2, 2418
50 4500, 15750, 41275/2, 23995/2, 2611 51 4500, 16050, 42865/2, 25401/2, 2818
52 4500, 16350, 44485/2, 26853/2, 3034 53 4500, 16650, 46135/2, 28367/2, 3266
54 4500, 16950, 47815/2, 29929/2, 3507 55 4500, 17250, 49525/2, 31555/2, 3765
56 4500, 17550, 51265/2, 33231/2, 4033 57 4500, 17850, 53035/2, 34973/2, 4319
58 4500, 18150, 54835/2, 36767/2, 4616 59 4500, 18450, 56665/2, 38629/2, 4932
Exercises
Exercise 1.3 Express the difference A − B in terms only of intersection and sym-
metric difference, and the union A ∪ B in terms only of intersection and symmetric
difference.
Exercises 85
− · · · + (−1) n−1
|A1 ∩ A2 ∩ · · · ∩ An | (1.E.1)
and
|A1 ∩ A2 ∩ · · · ∩ An | = |Ai | − Ai ∪ A j + Ai ∪ A j ∪ A k
i i< j i< j<k
− · · · + (−1) n−1
|A1 ∪ A2 ∪ · · · ∪ An | , (1.E.2)
− · · · + (−2) n−1
|A1 ∩ A2 ∩ · · · ∩ An | . (1.E.3)
(Hint. As observed in Example 1.1.35, any element in the set A1 ΔA2 Δ · · · ΔAn
belongs to only odd number of sets among {Ai }i=1
n
.)
Exercise 1.9 Is the collection of all non-overlapping open intervals with real end
points countable or uncountable?
Exercise 1.10 Find an injection from the set A to the set B in each of the pairs A
and B below. Here, J0 = J+ ∪ {0}, i.e.,
(1) A = J0 × J0 . B = J0 .
29 When n = 1, 2, . . . and {a }n
i i=0 are all integers with an = 0, a number z satisfying an z +
n
an−1 z n−1 + · · · + a0 = 0 is called an algebraic number. A number which is not an algebraic number
is called a transcendental number.
86 1 Preliminaries
Exercise 1.11 Find a function from the set A to the set B in each of the pairs A and
B below.
(1) A = J0 . B = J0 × J0 .
(2) A = J0 . B = Q.
(3) A = the Cantor set. B = [0, 1].
(4) A = the set of infinite sequences of 0 and 1. B = [0, 1].
Exercise 1.12 Find a one-to-one correspondence between the set A and the set B
in each of the pairs A and B below.
(1) A = (a, b). B = (c, d). Here, −∞ ≤ a < b ≤ ∞ and −∞ ≤ c < d ≤ ∞.
(2) A = the set of infinite sequences of 0, 1, and 2. B = the set of infinite sequences
of 0 and 1.
(3) A = [0, 1). B = [0, 1) × [0, 1).
Exercise 1.15 Is the collection of intervals with rational end points in the space R
of real numbers countable?
Exercise 1.17 Is the sum of two rational numbers a rational number? If we add one
more rational number, is the result a rational number? If we add an infinite number of
rational numbers, is the result a rational number? (Hint. The set of rational numbers
is closed under a finite number of additions, but is not closed under an infinite number
of additions.)
Exercise 1.18 Here, Q denotes the set of rational numbers defined in (1.1.29) and
0 < a < b < 1.
30 One of such one-to-one correspondences is a function called the Gödel pairing function.
Exercises 87
Exercise 1.19 Consider a game between two players. After a countable subset A of
the interval [0, 1] is determined, the two players alternately choose one number from
{0, 1, . . . , 9}. Let the numbers chosen by the first and second players be x0 , x1 , . . .
and y0 , y1 , . . ., respectively. When the number 0.x1 y1 x2 y2 · · · belongs to A, the first
player wins and otherwise the second player wins. Find a way for the second player
to win.
and
where A, B ⊆ R.
Exercise 1.22 Show that the function f (x) = x1 is continuous but not uniformly
continuous on S = (0, ∞).
√
Exercise 1.23 Show that the function f (x) = x is uniformly continuous on S =
(0, ∞).
Exercise 1.26 Obtain the Fourier transform F {u(x)} of the unit step function u(x)
by following the order shown below.
88 1 Preliminaries
2
lim Sα (ω) = . (1.E.10)
α→0 jω
(2) Show that the Fourier transform of the impulse function δ(x) is F {δ(x)} = 1.
Then, show
1
F {u(x)} = πδ(ω) + (1.E.12)
jω
n
δ (x1 , x2 , . . . , xn ) = δ (xi ) . (1.E.13)
i=1
Show
δ (r )
δ (x, y) = (1.E.14)
πr
for r = x 2 + y 2 and
Exercises 89
δ (r )
δ (x, y, z) = (1.E.15)
2πr 2
for r = x 2 + y2 + z2.
Exercise 1.31 Obtain the limits of the sequences below.
∞
(1) 1 + n1 , 2 n=1
∞
(2) 1 + n1 , 2 n=1
∞
(3) 1, 1 + n1 n=1
∞
(4) 1, 1 + n1 n=1
∞
(5) 1 − n1 , 2 n=1
∞
(6) 1 − n1 , 2 n=1
∞
(7) 1, 2 − n1 n=1
∞
(8) 1, 2 − n1 n=1
Exercise 1.32 Consider the sequence { f n (x)}∞
n=1 of functions with
⎧ 2
⎨ 2n x, 0 ≤ x ≤ 2n
1
,
f n (x) = 2n − 2n x, 2n ≤ x ≤ n1 ,
2 1
(1.E.16)
⎩
0, 1
n
≤ x ≤ 1.
1 1
By obtaining 0 lim f n (x)d x and lim 0 f n (x)d x, confirm that the order of inte-
n→∞ n→∞
gration and limit are not always interchangeable.
Exercise 1.33 For the function
⎧ 3
⎨ 2n x, 0 ≤ x ≤ 2n
1
,
f n (x) = 2n − 2n x, 2n ≤ x ≤ n1 ,
2 3 1
(1.E.17)
⎩
0, 1
n
≤ x ≤ 1,
b b
and a number b ∈ (0, 1], obtain 0 lim f n (x)d x and lim 0 f n (x)d x, which shows
n→∞ n→∞
that the order of integration and limit are not always interchangeable.
Exercise 1.34 Obtain the number of all possible arrangements with ten distinct red
balls and ten distinct black balls.
Exercise 1.35 Show31 the identities
and
n
and
n
n
k k n+1
= = . (1.E.22)
k=0
m k=m
m m+1
k0
k1
kn−1
k0
k0
k0
k0 + n − 1
··· 1 = ··· 1 = . (1.E.24)
k1 =1 k2 =1 kn =1 k1 =1 k2 =k1 kn =kn−1
n
Γ (β)Γ (α)
B̃(α, β) = (1.E.25)
Γ (α + β)
confirm (1.E.25).
Exercises 91
and
⎧
⎨ (−1)z (z + 1), z = −1, 0, . . . ,
−z+1
C
−2 z = (−1) (z + 1), z = −2, −3, . . . , (1.E.28)
⎩
±∞, otherwise,
1
Exercise 1.42 Obtain the series expansions of g1 (z) = (1 + z) 2 and g2 (z) = (1 +
z)− 2 .
1
Exercise 1.43 For non-negative numbers α and β such that α + β = 0, show that
αβ 2αβ α+β
≤ min(α, β) ≤ ≤ αβ ≤ ≤ max(α, β). (1.E.29)
α+β α+β 2
References
M. Abramowitz, I.A. Stegun (eds.), Handbook of Mathematical Functions (Dover, New York, 1972)
G.E. Andrews, R. Askey, R. Roy, Special Functions (Cambridge University, Cambridge, 1999)
E. Artin, The Gamma Function (Translated by M Rinehart, and Winston Butler) (Holt, New York,
1964)
B.C. Carlson, Special Functions of Applied Mathematics (Academic, New York, 1977)
J.L. Challifour, Generalized Functions and Fourier Analysis: An Introduction (W. A. Benjamin,
Reading, 1972)
C.A. Charalambides, Enumerative Combinatorics (Chapman and Hall, New York, 2002)
W.A. Gardner, Introduction to Random Processes with Applications to Signals and Systems, 2nd
edn. (McGraw-Hill, New York, 1990)
B.R. Gelbaum, J.M.H. Olmsted, Counterexamples in Analysis (Holden-Day, San Francisco, 1964)
92 1 Preliminaries
We first address the notions of algebra and sigma algebra (Bickel and Doksum 1977;
Leon-Garcia 2008), which are the bases in defining probability.
2.1.1 Algebra
and
if A ∈ A, then Ac ∈ A (2.1.2)
is called an algebra of S.
Example 2.1.2 From de Morgan’s law, (2.1.1), and (2.1.2), we get A ∩ B ∈ A when
A ∈ A and B ∈ A for an algebra A. Subsequently, we have
n
∪ Ai ∈ A (2.1.3)
i=1
and
n
∩ Ai ∈ A (2.1.4)
i=1
In other words, a collection is not an algebra of S if the collection does not include
∅ or S.
Solution The collections A1 = {S, ∅}, A2 = {S, {1}, {2, 3}, ∅}, A3 = {S, {2},
{1, 3}, ∅}, A4 = {S, {3}, {1, 2}, ∅}, and A5 = {S, {1}, {2}, {3}, {1, 2}, {2, 3}, {1, 3},
∅} are the algebras of S. ♦
Example 2.1.4 Assume J+ = {1, 2, . . .} defined in (1.1.3), and consider the collec-
tion A1 of all the sets obtained from a finite number of unions of the sets {1}, {2}, . . .
each containing a single natural number. Now, J+ is not an element of A1 because
it is not possible to obtain J+ from a finite number of unions of the sets {1}, {2}, . . ..
Consequently, A1 is not an algebra of J+ . ♦
2.1 Algebra and Sigma Algebra 95
Definition 2.1.2 (generated algebra) For a collection C of subsets of a set, the small-
est algebra to which all the element sets in C belong is called the algebra generated
from C and is denoted by A (C).
The implication of A (C) being the smallest algebra is that any algebra to which
all the element sets of C belong also contains all the element sets of A (C) as its
elements.
Example 2.1.5 When S = {1, 2, 3}, the algebra generated from C = {{1}} is A (C) =
A1 = {S, {1}, {2, 3}, ∅} because A2 = {S, {1}, {2}, {3}, {1, 2}, {2, 3}, {1, 3}, ∅} con-
tains all the elements of A1 = {S, {1}, {2, 3}, ∅}. ♦
Example 2.1.6 For the collection C = {{a}} of S = {a, b, c, d}, the algebra gener-
ated from C is A (C) = {∅, {a}, {b, c, d}, S}. ♦
∞
Theorem 2.1.3 Let A be an algebra of a set S, and {Ai }i=1 be a sequence of sets in
∞
A. Then, A contains a sequence {Bi }i=1 of sets such that
Bm ∩ Bn = ∅ (2.1.5)
for m = n and
∞ ∞
∪ Bi = ∪ Ai . (2.1.6)
i=1 i=1
In some cases, the results in finite and infinite spaces are different. For example,
although the result from a finite number of set operations on the elements of an algebra
96 2 Fundamentals of Probability
is an element of the algebra, the result from an infinite number of set operations is
not always an element of the algebra. This is similar to the fact that adding a finite
number of rational numbers results always in a rational number while adding an
infinite number of rational numbers sometimes results in an irrational number.
As it is clear from the example above, the algebra is unfortunately not closed
under a countable number of set operations. We now define the notion of σ -algebra
by adding one desirable property to algebra.
F = {∅, {a}, {b}, {c}, {d}, {a, b}, {a, c}, {a, d}, {b, c}, {b, d}, {c, d},
{a, b, c}, {a, b, d}, {a, c, d}, {b, c, d}, S} (2.1.8)
When the collection of all possible outcomes is finite as in a single toss of a coin
or a single rolling of a pair of dice, the limit, i.e., the infinite union in (2.1.7), does
not have significant implications and an algebra is also a sigma algebra. On the other
2.1 Algebra and Sigma Algebra 97
hand, when an algebra contains infinitely many element sets, the result of an infinite
number of unions of the element sets of the algebra does not always belong to the
algebra because an algebra is not closed under an infinite number of set operations.
Such a case occurs when the collection of all possible outcomes is from, for instance,
an infinite toss of a coin or an infinite rolling of a pair of dice.
AΥ = A (G Υ ) (2.1.9)
described in Example 1.1.44 are not elements of the algebra AΥ because an infinite
set cannot be obtained by a finite number of set operations on the element sets of
GΥ . ♦
of intervals [a, b) and [a, ∞) with 0 ≤ a ≤ b < ∞, and the collection F2 obtained
from a finite number of unions of the intervals in F1 . We have [a, a) = ∅ ∈ F1 and
[a, ∞) = Ω ∈ F1 with a = 0. Yet, although [a, b) ∈ F1 for 0 ≤ a ≤ b < ∞, we
have [a, b)c = [0, a) ∪ [b, ∞) ∈/ F1 for 0 < a < b < ∞. Thus, F1 is not an algebra
of Ω. On the other hand, F2 is an algebra1 of Ω because a finite number of unions
of the elements in F1 is an element of F2 , the complement of every element in
1Here, if the condition ‘0 ≤ a ≤ b < ∞’ is replaced with ‘0 ≤ a < b < ∞’, F2 is not an algebra
because the null set is not an element of F2 .
98 2 Fundamentals of Probability
is not an algebra. ♦
Definition 2.1.4 (generated σ -algebra) Consider a collection G of subsets of Ω.
The smallest σ -algebra that contains all the element sets of G is called the σ -algebra
generated from G and is denoted by σ (G).
The implication of the σ -algebra σ (G) being the smallest σ -algebra is that any
σ -algebra which contains all the elements of C will also contain all the elements of
σ (G).
Example 2.1.21 For S = {a, b, c, d}, the σ -algebra generated from C = {{a}} is
σ (C) = {∅, {a}, {b, c, d}, S}. ♦
Example 2.1.22 For the uncountable set Υ = {a = (a1 , a2 , . . .) : ai ∈ {0, 1}} of
one-sided binary sequences, consider the algebra A (G Υ ) and σ -algebra σ (G Υ ) gen-
erated from G Υ = {{ai } : ai ∈ Υ }. Then, as we have observed in Example 2.1.15,
the collection
ΥT ∈ σ (G Υ ) (2.1.16)
A probability space (Gray and Davisson 2010; Loeve 1977) is the triplet (Ω, F, P)
of an abstract space Ω, called the sample space; a sigma algebra F, called the event
space, of the sample space; and a set function P, called the probability measure,
assigning a number in [0, 1] to each of the element sets of the event space.
Definition 2.2.2 (sample space) The collection of all possible outcomes of an exper-
iment is called the sample space of the experiment.
The sample space, often denoted by S or Ω, is basically the same as the abstract
space in set theory.
Definition 2.2.3 (sample point) An element of the sample space is called a sample
point or an elementary outcome.
Example 2.2.5 In toss a coin, the sample space is S = {head, tail} and the sample
points are head and tail. In rolling a fair die, the sample space is S = {1, 2, 3, 4, 5, 6}
and the sample points are 1, 2, . . . , 6. In the experiment of rolling a die until a certain
number appears, the sample space is S = {1, 2, . . .} and the sample points are 1, 2, . . .
when the observation is the number of rolling. ♦
Example 2.2.6 In the experiment of choosing a real number between a and b ran-
domly, the sample space is Ω = (a, b). ♦
The sample spaces in Example 2.2.5 are countable sets, which are often called
discrete sample spaces or discrete spaces. The sample space Ω = (a, b) considered
in Example 2.2.6 is an uncountable space and is called a continuous sample space
or continuous space.
A finite dimensional vector space from a discrete space is, again, a discrete space:
on the other hand, it should be noted that an infinite dimensional vector space from
a discrete space, which is called a sequence space, is a continuous space. A mixture
of discrete and continuous spaces is called a mixed sample space or a hybrid sample
space.
Let us generally denote by I the index set such as the set R of real numbers, the
set
R0 = {x : x ≥ 0} (2.2.1)
Jk = {0, 1, . . . , k − 1} (2.2.2)
Example 2.2.7 When a coin is tossed and a die is rolled at the same time, the sam-
ple space is the discrete combined space S = {(head, 1), (head, 2), . . . , (head, 6),
(tail, 1), (tail, 2), . . . , (tail, 6)} of size 2 × 6 = 12. ♦
Example 2.2.8 When two coins are tossed once or a coin is tossed twice, the sample
space is the discrete combined space S = {(head, head), (head, tail), (tail, head),
(tail, tail)} of size 2 × 2 = 4. ♦
Definition 2.2.5 (event space; event) A sigma algebra obtained from a sample space
is called an event space, and an element of the event space is called an event.
An event in probability theory is roughly another name for a set in set theory.
Nonetheless, a non-measurable set discussed in Appendix 2.4 cannot be an event
and not all measurable sets are events: again, only the element sets of an event space
are events.
Example 2.2.11 Consider the sample space S = {1, 2, 3} and the event space C =
{{1}, {2, 3}, S, ∅}. Then, the subsets {1}, {2, 3}, S, and ∅ of S are events. However,
the other subsets {2}, {3}, {1, 3}, and {1, 2} of S are not events. ♦
As we can easily observe in Example 2.2.11, every event is a subset of the sample
space, but not all the subsets of a sample space are events.
Example 2.2.12 For a coin toss, the sample space is S = {head, tail}. If we assume
the event space F = {S, ∅, {head}, {tail}}, then the sets S, {head}, {tail}, and ∅ are
events, among which {head} and {tail} are elementary events. ♦
102 2 Fundamentals of Probability
Example 2.2.13 For rolling a die with the sample space S = {1, 2, 3, 4, 5, 6}, the
element 1, for example, of S can never be an event. The subset {1, 2, 3} may
sometimes be an event: specifically, the subset {1, 2, 3} is and is not an event for
the event spaces F = {{1, 2, 3}, {4, 5, 6}, S, ∅} and F = {{1, 2}, {3, 4, 5, 6}, S, ∅},
respectively. ♦
In addition, when a set is an event, the complement of the set is also an event even
if it cannot happen.
For a sample space, several event spaces may exist. For a sample space Ω, the
collection {∅, Ω} is the smallest event space and the power set 2Ω , the collection of
all the subsets of Ω as described in Definition 1.1.13, is the largest event space.
Let us now briefly describe why we base the probability theory on the more
restrictive σ -algebra, and not on algebra. As we have noted before, when the sample
space is finite, we could also have based the probability theory on algebra because
an algebra is basically the same as a σ -algebra. However, if the probability theory
is based on algebra when the sample space is an infinite set, it becomes impossible
to take some useful sets3 as events and to consider the limit of events, as we can see
from the examples below.
Example 2.2.16 If the event space is defined not as a σ -algebra but as an algebra,
then the limit is not an element of the event space for some sequences of events. For
example, even when all finite intervals (a, b) are events, no singleton set is an event
because a singleton set cannot be obtained from a finite number of set operations
on intervals (a, b). In a more practical scenario, even if “The voltage measured is
between a and b (V).” is an event, “The voltage measured is a (V).” would not be
an event if the event space were defined as an algebra. ♦
3Among those useful sets is the set ΥT = {all periodic binary sequences} considered in Example
2.1.15.
2.2 Probability Spaces 103
As we have already mentioned in Sect. 2.1.2, when the sample space is finite, the
limit is not so crucial and an algebra is a σ -algebra. Consequently, the probability
theory could also be based on algebra. When the sample space is an infinite set and
the event space is composed of an infinite number of sets, however, an algebra is
not closed under an infinite number of set operations. In such a case, the result of an
infinite number of operations on sets, i.e., the limit of a sequence, is not guaranteed
to be an event. In short, the fact that a σ -algebra is closed under a countable number
of set operations is the reason why we adopt the more restrictive σ -algebra as an
event space, and not the more general algebra.
An event space is a collection of subsets of the sample space closed under a
countable number of unions. We can show, for instance via de Morgan’s law, that an
event space is closed also under a countable number of other set operations such as
difference, complement, intersection, etc. It should be noted that an event space is
closed for a countable number of set operations, but not for an uncountable number
∞
of set operations. For example, the set ∪ Hr is also an event when {Hr }r∞=1 are all
r =1
events, but the set
∪ Br (2.2.4)
r ∈[0,1]
may or may not be an event when {Br : r ∈ [0, 1]} are all events.
Let us now discuss in some detail the condition
∞
∪ Bi ∈ F (2.2.5)
i=1
shown originally in (2.1.7), where {Bn }∞n=1 are all elements of the event space F.
This condition implies that the event space is closed under a countable number of
union operations. Recollect that the limit lim Bn of {Bn }∞
n=1 is defined as
n→∞
⎧
∞
⎪
⎨ ∪ Bi , {Bn }∞
n=1 is a non-decreasing sequence,
lim Bn = i=1
∞ (2.2.6)
n→∞ ⎪
⎩ ∩ Bi , {Bn }∞
n=1 is a non-increasing sequence
i=1
lim Bn ∈ F, {Bn }∞
n=1 is a non-decreasing sequence (2.2.7)
n→∞
Example 2.2.17 Let a sequence {Hn }∞n=1 of events in an event space F be non-
decreasing. As we have seen in (1.5.8) and (2.2.6), the limit of {Hn }∞
n=1 can be
∞
expressed in terms of the countable union as lim Hn = ∪ Hn . Because a countable
n→∞ n=1
104 2 Fundamentals of Probability
number of unions of events results in an event, the limit lim Hn is an event. For
∞ n→∞
example, when 1, 2 − n1 n=1 are all events, the limit [1, 2) of this non-decreasing
sequence of events is an event. In addition, when finite intervals of the form (a, b)
are all events, the limit (−∞, b) of {(−n, b)}∞n=1 will also be an event. Similarly,
assume a non-increasing sequence {Bn }∞n=1 of events in F. The limit of this sequence,
∞
lim Bn = ∩ Bn as shown in (1.5.9) or (2.2.6), will also be an event because it is
n→∞ n=1
a countable intersection of events. Therefore,
∞ intervals (a, b) are all events,
if finite
any singleton set {a}, the limit of a − n1 , a + n1 n=1 , will also be an event. ♦
Example 2.2.18 Let us show the equivalence of (2.2.5) and (2.2.7). We have already
observed that (2.2.7) holds true for a collection of events satisfying (2.2.5). Let us
thus show that (2.2.5) holds true for a collection of events satisfying (2.2.7). Con-
n
∞
sider a sequence {G i }i=1 of events chosen arbitrarily in F and let Hn = ∪ G i . Then,
i=1
∞ ∞
∪ G n = ∪ Hn and {Hn }∞
n=1 is a non-decreasing sequence. In addition, because
n=1 n=1
∞
{Hn }∞
n=1 is a non-decreasing sequence, we have ∪ Hi = lim Hn from (2.2.6). There-
i=1 n→∞
∞ ∞
∞
fore, ∪ G n = ∪ Hn = lim Hn ∈ F. In other words, for any sequence {G i }i=1 of
n=1 n=1 n→∞
events in F satisfying (2.2.7), we have
∞
∪ G n ∈ F. (2.2.8)
n=1
In essence, the two conditions (2.2.5) and (2.2.7) are equivalent, which implies that
(2.2.7), instead of (2.2.5), can be employed as one of the requirements for a collection
to be an event space. Similarly, instead of (2.2.5),
lim Bn ∈ F, {Bn }∞
n=1 is a non-increasing sequence (2.2.9)
n→∞
generated from all open intervals (a, b) in R is called the Borel algebra, Borel sigma
field, or Borel field of R.
The members of the Borel field, i.e., the sets obtained from a countable number
of set operations on open intervals, are called Borel sets.
2.2 Probability Spaces 105
∞
Example 2.2.19 It is possible to see that singleton sets {x} = ∩ x − n1 , x + n1 ,
n=1
half-open intervals [x, y) = (x, y) ∪ {x} and (x, y] = (x, y) ∪ {y}, and closed inter-
vals [x, y] = (x, y) ∪ {x} ∪ {y} are all Borel sets after some set operations. In addi-
tion, half-open intervals [x, +∞) = (−∞, x)c and (−∞, x] = (−∞, x) ∪ {x}, and
open intervals (x, ∞) = (−∞, x]c are also Borel sets. ♦
The Borel σ -algebra B (R) is the most useful and widely-used σ -algebra on the
set of real numbers, and contains all finite and infinite open, closed, and half-open
intervals, singleton sets, and the results from set operations on these sets. On the other
hand, the Borel σ -algebra B (R) is different from the collection of all subsets of R.
In other words, there exist some subsets of real numbers which are not contained in
the Borel σ -algebra. One such example is the Vitali set discussed in Appendix 2.4.
When the sample space is the real line R, we choose the Borel σ -algebra B (R) as
our event space. At the same time, when a subset Ω of real numbers is the sample
space, the Borel σ -algebra
B Ω = G : G = H ∩ Ω , H ∈ B (R) (2.2.11)
of Ω is assumed as the event space. Note that when the sample space is a discrete
subset A of the set of real numbers, Borel σ -algebra B(A) of A is the same as the
power set 2 A of A.
We now consider the notion of probability measure, the third element of a probability
space.
Definition 2.2.8 (measurable space) The pair (Ω, F) of a sample space Ω and an
event space F is called a measurable space.
Let us again mention that when the sample space S is countable or discrete,
we usually assume the power set of the
sample space as the event space: in other
words, the measurable space is S, 2 S . When the sample space S is uncountable or
continuous, we assume the event space described by (2.2.11): in other words, the
measurable space is (S, B(S)).
Definition 2.2.9 (probability measure) On a measurable space (Ω, F), a set func-
tion P assigning a real number P (Bi ) to each set Bi ∈ F under the constraint of the
following four axioms is called a probability measure or simply probability:
Axiom 1.
P (Bi ) ≥ 0. (2.2.12)
106 2 Fundamentals of Probability
Axiom 2.
P(Ω) = 1. (2.2.13)
∞
Axiom 4. When a countable number of events {Bi }i=1 are mutually exclusive,
∞
∞
P ∪ Bi = P (Bi ) . (2.2.15)
i=1
i=1
The probability measure P is a set function assigning a value P(G), called prob-
ability and also denoted by P{G} and Pr{G}, to an event G. A probability measure
is also called a probability function, probability distribution, or distribution.
Axioms 1–4 are also intuitively appealing. The first axiom that a probability
should be not smaller than 0 is in some sense chosen arbitrarily like other measures
such as area, volume, and weight. The second axiom is a mathematical expression
that something happens from an experiment or some outcome will result from an
experiment. The third axiom is called additivity or finite additivity, and implies
that the probability of the union of events with no common element is the sum of
the probability of each event, which is similar to the case of adding areas of non-
overlapping regions.
Axiom 4 is called the countable additivity, which is an asymptotic generalization
of Axiom 3 into the limit. This axiom is the key that differentiates the modern
probability theory developed by Kolmogorov from the elementary probability theory.
When evaluating the probability of an event which can be expressed, for example,
only by the limit of events, Axiom 4 is crucial: such an asymptotic procedure is similar
to obtaining the integral as the limit of a series. It should be noted that (2.2.14) does
not guarantee (2.2.15). In some cases, (2.2.14) is combined into (2.2.15) by viewing
Axiom 3 as a special case of Axiom 4.
If our definition of probability is based on the space of an algebra, then we may not
be able to describe, for example, some probability resulting from a countably infinite
number of set operations. To guarantee the existence of the probability in such a case
as well, we need sigma algebra which guarantees the result of a countably infinite
number of set operations to exist within our space.
From the axioms of probability, we can obtain
P (∅) = 0, (2.2.16)
P B c = 1 − P(B), (2.2.17)
2.2 Probability Spaces 107
and
P(B) ≤ 1, (2.2.18)
4 Here, fair means ‘head and tail are equally likely to occur’.
5 Because the probability measure P is a set function, P({k}) and P({head}), for instance, are the
exact expressions. Nonetheless, the expressions P(k), P{k}, P(head), and P{head} are also used.
6 The Vitali set V discussed in Definition 2.A.12 is a subset in the space of real numbers. Denote
0
∞
the rational numbers in (−1, 1) by {αi }i=1 and assume the translation operation Tt (x) = x + t.
∞
Then, the events Tαi V0 i=1 will produce a contradiction.
108 2 Fundamentals of Probability
2.3 Probability
and7
∞
∞
P ∪ Bi ≤ P (Bi ) . (2.3.2)
i=1 i=1
∞
Property 2. If {Bi }i=1 is a countable partition of the sample space Ω, then
∞
P(G) = P (G ∩ Bi ) (2.3.3)
i=1
for G ∈ F.
Property 3. Denoting the sum over nr = r !(n−r n!
ways of choosing r events from
)!
{Bi }i=1 by
n
P Bi1 Bi2 · · · Bir , we have
i 1 <i 2 <...<ir
n
n
P ∪ Bi = (−1) 0
P (Bi ) + (−1)1 P Bi1 Bi2
i=1
i=1 i 1 <i 2
r −1
+ · · · + (−1) P Bi1 Bi2 · · · Bir
i 1 <i 2 <···<ir
7 Among these properties, (2.3.2) is called the Boole inequality. The Boole inequality (2.3.2) can
also be written into two formulas similarly to Axioms 3 and 4 in Definition 2.2.9.
2.3 Probability 109
and
which can also be obtained as p1 = 1 − 18 = 78 by noting that the event ‘head occurs
at least once’ is the complementary event of ‘tail occurs three times’. ♦
Note that probability being zero for an event does not necessarily mean that the
event does not occur or, equivalently, that the event is the same as the ‘impossible’
event ∅. For example, in the space of real numbers, although P({a}) = 0 for any
value of a, the event {a} is different from ∅. In general, A and B are called the same
in probability when P(A) = P(B). Similarly, when
P(A) = P(B)
= P(AB) (2.3.8)
110 2 Fundamentals of Probability
P(AΔB) = 0, (2.3.9)
A and B are called the same with probability 1, in which8 ‘with probability 1’ can
be replaced by ‘almost everywhere (a.e.)’, ‘almost always’, ‘almost certainly (a.c.)’,
‘almost surely (a.s.)’, and ‘almost every point’. For example, when P(A) = 0, A is the
same as ∅ almost surely because P(∅) = P(A) = P(A ∩ ∅) = 0. When P(A) = 1,
A is the same as Ω almost surely because P(Ω) = P(A) = P(A ∩ Ω) = 1. Note
that A being the same as B almost surely does not necessarily mean A = B.
Example 2.3.3 For the sample space Ω = [0, 1], let the probability of an inter-
val be the length of the interval. Consider the four intervals A1 = [0.1, 0.2],
A2 = [0.1,
0.2), A 3 = (0.1, 0.2], and A 4 = (0.1, 0.2). Although P ( A i ) = P Aj =
P Ai A j = 0.1 for any i and j, it is clear that Ai = A j for i = j. In other words,
when i = j, Ai and A j are the same in probability and with probability 1, but they
are not the same event. In addition, A1 and B = [0.3, 0.4] are the same in probability
because P ( A1 ) = P(B) = 0.1, but they are neither the same with probability 1 nor
the same. ♦
n
n
n
n
n
P (Ai ) − P ( Ai ∩ A k ) ≤ P ∪ Ai ≤ P (Ai ) . (2.3.10)
i=1
i=1 i=1 k=i+1 i=1
Proof If we let
n
Ai − ∪ Ak , i = 1, 2, . . . , n − 1,
Bi = k=i+1 (2.3.11)
An , i = n,
n n n n
then we have P ∪ Ai =P ∪ Bi from ∪ Bi = ∪ Ai and, subsequently,
i=1 i=1 i=1 i=1
n
n
P ∪ Ai = P (Bi ) (2.3.12)
i=1
i=1
Next, recollect that P(C − D) = P(C) − P(C D) for two sets C and D from
P(C) = P(C
− D) + P(C {C − D, C D}is a partition of
D) because C. Then,
not-
n n n
ing that P Ai ∩ ∪ Ak =P
∪ (Ai ∩ Ak ) because Ai ∩ ∪ Ak =
k=i+1
k=i+1
k=i+1
n n
∪ (Ai ∩ Ak ) from (1.1.22), we have P (Bi ) = P Ai − ∪ Ak = P (Ai ) −
k=i+1
k=i+1
n
P Ai ∩ ∪ Ak , i.e.,
k=i+1
n
P (Bi ) = P (Ai ) − P ∪ (Ai ∩ Ak ) (2.3.15)
k=i+1
n
n
P ( Ai ) − P ( Ai ∩ A k ) ≤ P ( Ai ) − P ∪ (Ai ∩ Ak )
k=i+1
k=i+1
= P (Bi ) (2.3.16)
n
n
from (2.3.15) because P ∪ (Ai ∩ Ak ) ≤ P ( Ai ∩ Ak ). Now, from (2.3.12)
k=i+1 k=i+1
and (2.3.16), we get
n
n
n
n
P (Ai ) − P (Ai ∩ Ak ) ≤ P (Bi )
i=1 i=1 k=i+1 i=1
n
=P ∪ Ai , (2.3.17)
i=1
Example 2.3.4 Assume the sample space S = {1, 2, . . . , 10}, event space 2 S , and
probability measure P(k) = 55 k
for k = 1, 2, . . . , 10. Consider the three events A1 =
3
{1, 2, 3}, A2 = {3, 4, 5, 6}, and A3 = {5, 6, 7, 8}. First, we have P ∪ Ai = 36 55
.
i=1
3
We also get P ( Ai ) = 50
55
from P (A1 ) = 6
55
, P ( A2 ) = 18
55
, and P ( A3 ) = 26
55
.
i=1
Finally, from P ( A1 ∩ A2 ) = 3
55
, P ( A2 ∩ A3 ) = 11
55
, and P (A3 ∩ A1 ) = 0, we have
112 2 Fundamentals of Probability
3
3
3 3
P ( Ai ) − P ( Ai ∩ A k ) = 50
55
− 55
+ 11
55
= 36
55
. In short, (2.3.10) is con-
i=1 i=1 k=i+1
firmed. ♦
When all the outcomes from a random experiment are equally likely, the probability
of an event can be defined by the ratio of the number of desired outcomes to the total
number of outcomes. Specifically, the probability of A is given by the ratio
NA
P(A) = , (2.3.18)
N
where N is the number10 of all possible outcomes and N A is the number of desired
outcomes for A. The condition of equally likely occurrence is the key in the classical
definition.
Example 2.3.5 Obtain the probability of head when a fair coin is tossed once.
Solution There are two equally likely possible outcomes head and tail, among which
the desired outcome is head. Thus, the probability of head is 21 . ♦
Example 2.3.6 Obtain the probability P3 that the three pieces are all longer than 41
when a rod of length 1 is divided into three pieces by choosing two points at random.
Solution View the rod of length 1 as the interval [0, 1] on the real line, and let the
coordinate of the two points of cutting be x and y with 0 < x < 1 and 0 < y < 1 as
shown in Fig. 2.2. The three pieces will all be longer than 41 when x < y, 41 < x < 21 ,
and x + 41 < y < 43 or when x > y, 41 < y < 21 , and y + 41 < x < 43 . Therefore,
from
area of the region of desired outcomes
P3 = , (2.3.19)
area of the whole region
A1 +A2
21 3 21 3
we get P3 = AT
= 1
4
x+ 14
1 dy dx + 1
4
y+ 41
1 d x d y, i.e.,
4 4
10 Note that the number is sometimes replaced by other quantity such as area, volume, or length as
it is shown in Example 2.3.6.
2.3 Probability 113
1 1 3 1 x
4 2 4
1
P3 = (2.3.20)
16
referring to Fig. 2.3, where A T = 1 denotes the area of the whole region {0 <
x < 1, 0 < y < 1}, A1 is the area of the region x < y, 41 < x < 21 , x + 41 < y <
3
4
, and A2 is the area of the region x > y, y + 41 < x < 43 , 41 < y < 21 . ♦
Example 2.3.7 Obtain the probability PT that the three pieces will form a triangle
when a rod is divided into three pieces by choosing two points at random.
Solution Let the length of the rod be 1, and follow the description in
Example 2.3.6. Then, the lengths of the three pieces are x, y − x, and 1 − y when
x < y and y, x − y, and 1 − x when x > y. Thus, for the three pieces to form a tri-
angle, it is required that 0 < x < 1, 0 < y < 1, x < y, y > 21 , y < x + 21 , x < 21
or 0 < x < 1, 0 < y < 1, x > y, x > 21 , x < y + 21 , y < 21 because the sum of
the lengths of two pieces should be longer than the length of the remaining piece.
Consequently, from
1
PT = , (2.3.22)
4
where A T = 1 is the area of the region {0 < x < 1, 0 < y < 1}, A1 = 18 is the area
of the region 0 < x < 1, 0 < y < 1, x < y, y > 21 , y < x + 21 , x < 21 , and A2 =
1
8
is the area of the region 0 < x < 1, 0 < y < 1, x > y, x > 21 , x < y + 21 , y <
1
2
. A similar problem will be discussed in Example 3.4.2. ♦
114 2 Fundamentals of Probability
60◦ A
60◦
F G M H K
r r
2 2
Solution (Solution 1) Assume that the√ center point M of the chord is chosen
randomly. As shown in Fig. 2.4, l ≥ 3r is satisfied if the center point M is
located in or on the circle C1 of radius r2 with center the same as that of C. Thus,
−1
PB = 41 πr 2 πr 2 = 41 .
(Solution 2) Assume that the point B is selected
√ randomly on the circle C with the
point A fixed. As shown in Fig. 2.5, l ≥ 3r is satisfied when the point B is on
the shorter arc D E, where D and E are the two points 23 πr apart from A along C
in two directions. Therefore, we have PB = 23 πr (2πr )−1 = 13 because the length of
the shorter arc D E is 23 πr .
(Solution 3) Assume that the chord AB √ is drawn orthogonal to a diameter F K of C.
As shown in Fig. 2.6, we then have l ≥ 3r if the center point M is located between
the two points H and G, located r2 apart from K toward F and from F toward K ,
respectively. Therefore, PB = 2rr = 21 .
This example illustrates that the experiment should be described clearly to obtain
the probability appropriately with the classical definition. ♦
Probability can also be defined in terms of the relative frequency of desired outcomes
in a number of repetitions of a random experiment. Specifically, the relative frequency
of a desired event A can be defined as
nA
qn (A) = , (2.3.23)
n
116 2 Fundamentals of Probability
of the relative frequency. One drawback of this definition is that the limit shown in
(2.3.24) can be obtained only by an approximation in practice.
Example 2.3.11 The probability that a person of a certain age will survive for a
year will differ from year to year, and thus it is difficult to use the classical definition
of probability. As an alternative in such a case, we assume that the tendency in the
future will be the same as that so far, and then compute the probability as the relative
frequency based on the records over a long period for the same age. This method is
often employed in determining an insurance premium. ♦
P(A ∩ B)
P(A|B) = (2.4.1)
P(B)
In other words, the conditional probability P(A|B) of event A under the assump-
tion that event B has occurred is the probability of A with the sample space Ω replaced
by the conditioning event B. Often, the event A conditioned on B is denoted by A|B.
From (2.4.1), we easily get
P(A)
P(A|B) = (2.4.2)
P(B)
2.4 Conditional Probability 117
when A ⊆ B and
P(A|B) = 1 (2.4.3)
when B ⊆ A.
Example 2.4.1 When the conditioning event B is the sample space Ω, we have
P(A|Ω) = P(A) because A ∩ B = A ∩ Ω = A and P(B) = P(Ω) = 1. ♦
Example 2.4.2 Consider the rolling of a fair die. Assume that we know the outcome
is an even number. Obtain the probability that the outcome is 2.
Solution Let A = {2} and B = {2, 4, 6}. Then, because A ∩ B = A, P(A ∩ B) =
−1
P(A) = 16 , and P(B) = 21 = 0, we have P(A|B) = 16 21 = 13 . Again, P(A|B) is
the probability of A = {2} when B = {an even number} = {2, 4, 6} is assumed as
the sample space. ♦
Example 2.4.3 The probability for any child to be a girl is α and is not influenced
by other children. Assume that Dr. Kim has two children. Obtain the probabilities in
the following two separate cases: (1) the probability p1 that the younger child is a
daughter when Dr. Kim says “The elder one is a daughter”, and (2) the probability
p2 that the other child is a daughter when Dr. Kim says “One of my children is a
daughter”.
1 ∩D2 )
= αα = α, where D1 and D2 denote
2
Solution We have p1 = P ( D2 | D1 ) = P(D P(D1 )
the events of the first and second child being a daughter, respectively. Similarly,
because P (C A ) = P (D1 ∩ D2 ) + P (D1 ∩ B2 ) + P (B1 ∩ D2 ) = 2α − α 2 , we get
A ∩C B ) α2 α
p2 = P ( C B | C A ) = P(C
P(C A )
= 2α−α 2 = 2−α , where B1 and B2 denote the events
of the first and second child being a boy, respectively, and C A and C B denote the
events of one and the other child being a daughter, respectively. ♦
n
P ∩ Bi = P (B1 ) P ( B2 | B1 ) · · · P ( Bn | B1 B2 · · · Bn−1 ) , (2.4.4)
i=1
which is called the multiplication theorem. Similarly, the probability of the union
n
∪ Bi can be expressed as
i=1
118 2 Fundamentals of Probability
n
P ∪ Bi = P (B1 ) + P B1c B2 + · · · + P B1c B2c · · · Bn−1
c
Bn , (2.4.5)
i=1
P (B1 ∩ B2 ) = P (B1 ) P ( B2 | B1 )
= P (B2 ) P ( B1 | B2 ) (2.4.6)
when n = 2. Now, from 1 = P (Ω |B2 ) = P B1c ∪ B1 |B2 = P B1c |B2 +
P ( B1 | B2 ), we have P B1c |B2 = 1 − P ( B1 | B2 ). Using this result and (2.4.6),
(2.4.5) for n = 2can be written as P (B1 ∪ B2 ) = P (B1 ) + P B1c B2 = P (B1 ) +
P (B2 ) P B1c |B2 = P (B1 ) + P (B2 ) {1 − P ( B1 | B2 )}, i.e.,
A = AB ∪ AB c , (2.4.8)
from (2.2.14) because AB and AB c are mutually exclusive. The result (2.4.9) shows
that the probability of A is the weighted sum of the conditional probabilities of A
when B and B c are assumed with the weights the probabilities of the conditioning
events B and B c , respectively.
2.4 Conditional Probability 119
The result (2.4.9) is quite useful when a direct calculation of the probability of an
event is not straightforward.
Example 2.4.5 In Box 1, we have two white and four black balls. Box 2 contains
one white and one black balls. We randomly take one ball from Box 1, put it into
Box 2, and then randomly take one ball from Box 2. Find the probability PW that
the ball taken from Box 2 is white.
Solution Let the events of a white ball from Box 1 and a white ball from Box 2
be C and D, respectively. Then, because P(C) = 13 , P(D|C) = 23 , P (D |C c ) =
1
3
, and P (C c ) = 1 − P(C) = 23 , we get PW = P(D) = P(D|C)P(C) + P (D |C c )
P (C c ) = 49 . ♦
Example 2.4.6 The two numbers of the upward faces are added after rolling a pair
of fair dice. Obtain the probability α that 5 appears before 7 when we continue the
rolling until the outcome is 5 or 7.
Solution Let An and Bn be the events that the outcome is neither 5 nor 7 and that the
outcome is 5, respectively, at the n-th rolling for n = 1, 2, . . .. Then, P (An ) = 1 −
P(5) − P(7) = 13 18
from P (Bn ) = P(5) = 36 4
= 19 and P(7) = 36 6
= 16 . Now, α =
∞
P (B1 ∪ (A1 B2 ) ∪ (A1 A2 B3 ) · · · ) = P ( A1 A2 . . . An−1 Bn ) from (2.4.5) because
n=1
{A1 A2 . . . An−1 Bn }∞
n=1 are mutually exclusive, where we assume A0 B1 = B1 . Here,
Example 2.4.7 Example 2.4.6 can be viewed in a more intuitive way as follows:
Consider two mutually exclusive events A and B from a random experiment and
assume the experiments are repeated. Then, the probability that A occurs before B
can be obtained as
probability of A P(A)
= . (2.4.11)
probability of A or B P(A) + P(B)
1
Solving Example 2.4.6 based on (2.4.11), we get P(5 appears before 7) = 1
−1 9 9
+ 16 = 25 . ♦
Let us now generalize the number of conditioning events in (2.4.9). Assume a
n n
collection B j j=1 of mutually exclusive events and let A ⊆ ∪ B j . Then, P(A) =
j=1
n
P ∪ A ∩ B j can be expressed as
j=1
120 2 Fundamentals of Probability
n
P(A) = P AB j (2.4.12)
j=1
n n n
because A = A ∩ = ∪ A ∩ B j and A ∩ B j j=1 are all mutually exclu-
∪ Bj
j=1
j=1
sive. Now, recollecting that P AB j = P A B j P B j , we get the following the-
orem called the total probability theorem:
n
P(A) = P A B j P B j (2.4.13)
j=1
n n
when B j j=1
is a collection of disjoint events and A ⊆ ∪ B j .
j=1
Example 2.4.8 Let A = {1, 2, 3} in the experiment of rolling a fair die. When
B1 = {1, 2} and B2 = {3, 4, 5, 6}, we have P(A) = 21 = 1 × 13 + 41 × 23 because
1)
−1
P (B1 ) = 13 , P (B2 ) = 23 , P ( A |B1 ) = P(AB
P(B1 )
= 13 × 13 = 1, and P ( A |B2 ) =
P(AB2 )
−1
P(B2 )
= 16 × 23 = 14 . Similarly, when B1 = {1} and B2 = {2, 3, 5}, we get
P(AB1 )
P(A) = 1
2
=1× 1
6
+ 2
3
× 21 from P (B1 ) = 16 , P (B2 ) = 21 , P (A |B1 ) = P(B1 )
=
P(AB2 )
1, and P ( A |B2 ) = P(B2 )
= 23 . ♦
Example 2.4.9 Assume a group comprising 60% women and 40% men. Among the
women, 45% play violin, and 25% of the men play violin. A person chosen randomly
from the group plays violin. Find the probability that the person is a man.
Solution Denote the events of a person being a man and a woman by M and W ,
respectively, and playing violin by V . Then, using (2.4.1) and (2.4.13), we get
V) P(V |M)P(M)
P(M|V ) = P(M P(V )
= P(V |M)P(M)+P(V |W )P(W )
= 10
37
because M c = W , P(W ) = 0.6,
P(M) = 0.4, P(V |W ) = 0.45, and P(V |M) = 0.25. ♦
P ( A |Bk ) P (Bk )
P ( Bk | A) = (2.4.14)
P(A)
because P ( Bk | A) = P(B k A)
P(A)
and P (Bk A) = P ( A |Bk ) P (Bk ) from the definition of
conditional probability. Now, combining the results (2.4.13) and (2.4.14) when the
n
events {Bi }i=1
n
are all mutually exclusive and A ⊆ ∪ Bi , we get the following result
i=1
called the Bayes’ theorem:
2.4 Conditional Probability 121
P (A |Bk ) P (Bk )
P ( Bk | A) = . (2.4.15)
n
P A B j P B j
j=1
n n
when B j j=1
is a collection of disjoint events, A ⊆ ∪ B j , and P(A) = 0.
j=1
Example 2.4.10 In Example 2.4.5, obtain the probability that the ball drawn from
Box 1 is white when the ball drawn from Box 2 is white.
Solution Using the results of Example 2.4.5 and the Bayes’ theorem, we have
P(C|D) = P(C D)
P(D)
= P(D|C)P(C)
P(D)
= 21 . ♦
Example 2.4.11 For the random experiment of rolling a fair die, assume A =
{2, 4, 5, 6}, B1 = {1, 2}, and B2 = {3, 4, 5}. Obtain P ( B2 | A).
2
Solution We easily get P(A) = 23 . On the other hand, P A B j P B j = 1
2
j=1
1)
−1
from P (B1 ) = 13 ,P (B2 ) = 21 , P (A |B1 ) = P(AB
P(B1 )
= 16 × 13 = 21 , and
2)
−1
P (A |B2 ) = P(AB
P(B2 )
= 13 × 21 = 23 . In other words, because A = {2, 4, 5, 6} is
2
not a subset of B1 ∪ B2 = {1, 2, 3, 4, 5}, we have P(A) = P A B j P B j .
j=1
P(A|B2 )P(B2 )
Thus, we would get P ( B2 | A) =
2 = 23 , an incorrect answer, if we use
P ( A | B j )P ( B j )
j=1
P(A|B2 )P(B2 )
(2.4.15) carelessly. The correct answer P ( B2 | A) = P(A)
= 1
2
can be obtained
by using (2.4.14) in this case. ♦
Let us consider an example for the application of the Bayes’ theorem.
Example 2.4.12 Assume four boxes with 2000, 500, 1000, and 1000 parts of a
machine, respectively. The probability of a part being defective is 0.05, 0.4, 0.1, and
0.1, respectively, for the four boxes.
(1) When a box is chosen at random and then a part is picked randomly from the
box, calculate the probability that the part is defective.
122 2 Fundamentals of Probability
(2) Assuming the part picked is defective, calculate the probability that the part is
from the second box.
(3) Assuming the part picked is defective, calculate the probability that the part is
from the third box.
Solution Let A and Bi be the events that the part picked is defective and the part is
from the i-th box, respectively. Then, P (Bi ) = 41 for i = 1, 2, 3, 4. In addition, the
value of P (A |B2 ), for instance, is 0.4 because P ( A |B2 ) denotes the probability
that a part picked is defective when it is from the second box.
4
(1) Noting that {Bi }i=1
4
are all disjoint, we get P(A) = P ( A |Bi ) P (Bi ) = 1
4
×
i=1
0.05 + 1
4
× 0.4 + 1
4
× 0.1 + 1
4
× 0.1, i.e.,
13
P(A) = (2.4.16)
80
from (2.4.13).
P(A|B2 )P(B2 )
(2) The probability to obtain is P ( B2 | A). We get P ( B2 | A) =
4 =
P ( A | B j )P ( B j )
j=1
0.4× 14
= as shown in (2.4.15) because P(A) = from (2.4.16), P (B2 ) = 14 ,
13
8
13
13
80
80
and P ( A |B2 ) = 0.4.
0.1× 1 3 )P(B3 )
(3) Similarly, we get12 P ( B3 | A) = 13 4 = 13
2
from P ( B3 | A) = P(A|BP(A) ,
80
P (B3 ) = 41 , P ( A |B3 ) = 0.1, and P(A) = 13
80
.
♦
If we calculate similarly the probabilities for the first and fourth boxes and then
add the four values in Example 2.4.1, we will get 1: in other words, we have
4
P ( Bi | A) = P (Ω|A) = 1.
i=1
Assume two boxes. Box 1 contains one red ball and two green balls, and Box 2
contains two red balls and four green balls. If we pick a ball randomly after choosing
a box with probability P(Box 1) = p = 1 − P(Box 2), then we have P(red ball) =
P ( red ball | Box 1) P(Box 1) + P ( red ball | Box 2) P(Box 2) = 13 p + 13 (1 − p) =
1
3
and P(green ball) = 23 . Note that
12 Here, 13
80 = 0.1625, 8
13 ≈ 0.6154, and 2
13 ≈ 0.1538.
2.4 Conditional Probability 123
and P(green ball) = P ( green ball | Box 1) = P ( green ball | Box 2): whichever box
we choose or whatever value the probability of choosing a box is, the probability that
the ball picked is red and green is 13 and 23 , respectively. In other words, the choice
of a box does not influence the probability of the color of the ball picked. On the
other hand, if Box 1 contains one red ball and two green balls and Box 2 contains
two red balls and one green ball, the choice of a box will influence the probability
of the color of the ball picked. Such an influence is commonly represented by the
notion of independence.
Definition 2.4.2 (independence of two events) If the probability P(AB) of the inter-
section of two events A and B is equal to the product P(A)P(B) of the probabilities
of the two events, i.e., if
then A and B are called independent (of each other) or mutually independent.
Example 2.4.13 Assume the sample space S = {1, 2, . . . , 9} and P(k) = 19 for
k = 1, 2, . . . , 9. Consider the events A = {1, 2, 3} and B = {3, 4, 5}. Then, P(A) =
1
3
, P(B) = 13 , and P(AB) = P(3) = 19 , and therefore P(AB) = P(A)P(B). Thus,
A and B are independent of each other. Likewise, for the sample space S =
{1, 2, . . . , 6}, the events C = {1, 2, 3} and D = {3, 4} are independent of each other
when P(k) = 16 for k = 1, 2, . . . , 6. ♦
When one of two events has probability 1 as the sample space S or 0 as the null
set ∅, the two events are independent of each other because (2.4.18) holds true.
Theorem 2.4.4 An event with probability 1 or 0 is independent of any other event.
Example 2.4.14 Assume the sample space S = {1, 2, . . . , 5} and let P(k) = 15 for
k = 1, 2, . . . , 5. Then, no two sets, excluding S and ∅, are independent of each other.
When P(1) = 10 1
, P(2) = P(3) = P(4) = 15 , and P(5) = 10 3
for the sample space
S = {1, 2, . . . , 5}, the events A = {3, 4} and B = {4, 5} are independent because
P(A)P(B) = P(4) from P(A) = 25 , P(B) = 21 , and P(4) = 15 . ♦
In general, two mutually exclusive events are not independent of each other: on
the other hand, we have the following theorem from Theorem 2.4.4:
Theorem 2.4.5 If at least one event has probability 0, then two mutually exclusive
events are independent of each other.
Example 2.4.15 For the sample space S = {1, 2, 3}, let the power set 2 S =
{∅, {1}, {2}, . . . , S} be the event space. Assume the probability measure P(1) = 0,
P(2) = 13 , and P(3) = 23 . Then, the events {2} and {3} are mutually exclusive, but not
independent of each other because P(2)P(3) = 29 = 0 = P(∅). On the other hand,
the events {1} and {2} are mutually exclusive and, at the same time, independent of
each other. ♦
124 2 Fundamentals of Probability
Theorem 2.4.6 If the events A and B are independent of each other, then A and B c
are also independent of each other, P(A|B) = P(A), and P(B|A) = P(B).
Example 2.4.16 Assume the sample space S = {1, 2, . . . , 6} and probability mea-
sure P(k) = 16 for k = 1, 2, . . . , 6. The events A = {1, 2, 3} and B = {3, 4} are
independent of each other as we have already observed in Example 2.4.13. Here,
B c = {1, 2, 5, 6} and thus P (B c ) = 23 and P (AB c ) = P(1, 2) = 13 . In other words,
A and B c are independent of each other because P ( AB c ) = P(A)P (B c ). We also
have P(A|B) = 21 = P(A) and P(B|A) = 13 = P(B). ♦
Example 2.4.17 When A, B, and C are independent of each other with P(AB) = 13 ,
P(BC) = 16 , and P(AC) = 29 , obtain the probability of C.
Example 2.4.18 For the sample space Ω = {1, 2, 3, 4} of equally likely outcomes,
consider A1 = {1, 2}, A2 = {2, 3}, and A3 = {1, 3}. Then, A1 and A2 are independent
of each other, A2 and A3 are independent of each other, and A3 and A1 are independent
of each other because P ( A1 ) = P ( A2 ) = P ( A3 ) = 21 , P ( A1 A2 ) = P ({2}) = 14 ,
P (A2 A3 ) = P ({3}) = 41 , and P ( A3 A1 ) = P ({1}) = 14 . However, A1 , A2 , and A3
are not independent of each other because P ( A1 A2 A3 ) = P(∅) = 0 is not equal to
P (A1 ) P (A2 ) P (A3 ) = 18 . ♦
Solution When the circuit elements are connected in series, every circuit element
should function normally for the circuit to function normally. Thus, we have
PS = p n (2.4.20)
On the other hand, the circuit will function normally if at least one of the circuit
elements functions normally. Therefore, we get
PP = 1 − (1 − p)n (2.4.21)
because the complement of the event that at least one of the circuit elements functions
normally is the event that all elements are malfunctioning. Note that 1 − (1 − p)n >
p n for n = 1, 2, . . . when p ∈ (0, 1). ♦
In this section, the notions of probability mass functions and probability density func-
tions (Kim 2010), which are equivalent to the probability measure for the description
of a probability space, and are more convenient tools when managing mathematical
operations such as differentiation and integration, will be introduced.
p(ω) ≥ 0, ω ∈ Ω (2.5.1)
and
p(ω) = 1 (2.5.2)
ω∈Ω
p(ω) ≤ 1 (2.5.3)
for every ω ∈ Ω.
126 2 Fundamentals of Probability
Example 2.5.2 For the sample space Ω = {x1 , x2 } and a number α ∈ (0, 1), the
function
1 − α, x = x1 ,
p(x) = (2.5.7)
α, x = x2
Definition 2.5.2 (Bernoulli trial) An experiment with two possible outcomes, i.e.,
an experiment for which the sample space has two elements, is called a Bernoulli
experiment or a Bernoulli trial.
which is called the binary pmf or Bernoulli pmf. The binary distribution is usually
denoted by b(1, α), where 1 signifies the number of Bernoulli trial and α represents
the probability of the desired event or success. ♦
2.5 Classes of Probability Spaces 127
Example 2.5.4 In the experiment of rolling a fair die, assume the events A =
{1, 2, 3, 4} and Ac = {5,
6}. Then, if we choose A as the desired event, the dis-
tribution of A is b 1, 23 . ♦
Example 2.5.5 When the sample space is Ω = Jn = {0, 1, . . . , n − 1}, the pmf
1
p(k) = , k ∈ Jn (2.5.9)
n
is called a uniform pmf. ♦
Example 2.5.6 For the sample space Ω = {1, 2, . . .} and a number α ∈ (0, 1), the
pmf
is called a geometric pmf. The distribution represented by the geometric pmf (2.5.10)
is called the geometric distribution with parameter α and denoted by Geom(α). ♦
When a Bernoulli trial with probability α of success is repeated until the first
success, the distribution of the number of failures is Geom(α). In some cases, the
function p(k) = (1 − α)k−1 α for k ∈ {1, 2, . . .} with α ∈ (0, 1) is called the geo-
metric pmf. In such a case, the distribution of the number of repetitions is Geom(α)
when a Bernoulli trial with probability α of success is repeated until the first success.
Example 2.5.7 Based on the binary pmf discussed in Example 2.5.3, let us intro-
duce the binomial pmf. Consider the sample space Ω = Jn+1 = {0, 1, . . . , n} and a
number α ∈ (0, 1). Then, the function
p(k) = n Ck α
k
(1 − α)n−k , k ∈ Jn+1 (2.5.11)
Example 2.5.8 For the sample space Ω = J0 = {0, 1, . . .} and a number λ ∈ (0, ∞),
the function
λk
p(k) = e−λ , k ∈ J0 (2.5.12)
k!
0.3
0.25 α = 0.4
n = 10
0.2
0.15
p(k)
n = 50
0.1 n = 100
n = 150
0.05
0
0 10 20 30 40 50 60 70 80
k
0.35
0.3
∗ λ = 0.5
0.25
◦λ=3
probability
0.2
0.15
0.1
0.05
0
0 2 4 6 8 10 12
k
λ
For the Poisson pmf (2.5.12), recollecting p(k+1)
p(k)
= k+1 , we have p(0) ≤ p(1) ≤
· · · ≤ p(λ − 1) = p(λ) ≥ p(λ + 1) ≥ p(λ + 2) ≥ · · · when λ is an integer, and
p(0) ≤ p(1) ≤ · · · ≤ p(λ − 1) ≤ p(λ) and p(λ) ≥ p(λ + 1) ≥ · · · when
λ is not an integer, where the floor function x is defined following (1.A.44) in
Appendix 1.2. Figure 2.8 shows two examples of Poisson pmf. The Poisson pmf will
be discussed in more detail in Sect. 3.5.3.
2.5 Classes of Probability Spaces 129
Example 2.5.9 For the sample space Ω = J0 = {0, 1, . . .}, r ∈ (0, ∞), and α ∈
(0, 1), the function
p(x) = −r Cx α
r
(α − 1)x , x ∈ J0 (2.5.13)
is called a negative binomial (NB) pmf, and the distribution with the pmf (2.5.13) is
denoted by NB(r, α). ♦
When r = 1, the NB pmf (2.5.13) is the geometric pmf discussed in Example
2.5.6. The NB pmf with r a natural number and a real number is called the Pascal
pmf and Polya pmf, respectively.
The meaning of NB(r, α) and the formula of the NB pmf vary depending on
whether the sample space is {0, 1, . . .} or {r, r + 1, . . .}, whether r represents a
success or a failure, or whether α is the probability of success or failure. In (2.5.13),
the parameters r and α represent the number and probability of success, respectively.
When a Bernoulli trial with the probability α of success is repeated until the r -th
success, the distribution of the number of repetitions is NB(r, α).
∞
∞
−r
We clearly have p(x) = 1 because −r Cx (α − 1) = (1 + α − 1)
x
=
x=0 x=0
α −r from (1.A.12) with p = −r and z = α − 1. Now, the pmf (2.5.13) can be written
as p(x) = r +x−1 Cx αr (1 − α)x or, equivalently, as
p(x) = r +x−1 Cr −1 α
r
(1 − α)x , x ∈ J0 (2.5.14)
using13 −r Cx = 1
x!
(−r )(−r − 1) · · · (−r − x + 1), i.e.,
∞
∞
(r +x−1)!
∞
because r +x−1 Cx (1 − α)x = (r −1)!x!
(1 − α)x = r +x−1 Cr −1 (1 − α)x
x=0 x=0 x=0
∞
and p(x) = 1. Letting x + r = y in (2.5.14), we get
x=0
p(y) = y−1 Cr −1 α
r
(1 − α) y−r , y = r, r + 1, . . . (2.5.17)
when r is a natural number, which is called the NB pmf sometimes. Here, note that
x+r −1 Cx |x=y−r = y−1 C y−r = y−1 Cr −1 .
Let us now consider the continuous probability space with the measurable space
(Ω, F) = (R, B (R)): in other words, the sample space Ω is the set R of real numbers
and the event space is the Borel field B (R).
Definition 2.5.3 (probability density function) In a measurable space (R, B (R)), a
real-valued function f , with the two properties
f (r ) ≥ 0, r ∈Ω (2.5.18)
and
f (r )dr = 1 (2.5.19)
Ω
is the probability measure of the probability space on which f is defined. Note that
(2.5.20) is a counterpart of (2.5.5). While we have (2.5.6), an equation describing
the pmf in terms of the probability measure in the discrete probability space, we do
not have its counterpart in the continuous probability space, which would describe
the pdf in terms of the probability measure.
Do the integrals in (2.5.19) and (2.5.20) have any meaning? For interval events
or finite unions of interval events, we can adopt the Riemann integral as in most
engineering problems and calculations. On the other hand, the Riemann integral has
some caveats including that the order of the limit and integral for a sequence of
functions is not interchangeable. In addition, the Riemann integral is not defined in
some cases. For example, when
1, r ∈ [0, 1],
f (r ) = (2.5.21)
0, otherwise,
by adopting the Lebesgue integral. Compared to the Riemann integral, the Lebesgue
integral has the following three important advantages:
(1) The Lebesgue integral is defined for any Borel set.
(2) The order of the limit and integral can almost always be interchanged in the
Lebesgue integral.
(3) When a function is Riemann integrable, it is also Lebesgue integrable, and the
results are known to be the same.
Like the pmf, the pdf is defined on the points in the sample space, not on the
events. On the other hand, unlike the pmf p(·) for which p(ω) directly represents
the probability P({ω}), the value f (x0 ) at a point x0 of the pdf f (x) is not the
probability at x = x0 . Instead, f (x0 ) d x represents the probability for the arbitrarily
small interval [x0 , x0 + d x). While the value of a pmf cannot be larger than 1 at
any point, the value of a pdf can be larger than 1 at some points. In addition, the
probability of a countable event is 0 even when the value of the pdf is not 0 in the
continuous space: for the pdf
2, x ∈ [0, 0.5],
f (x) = (2.5.22)
0, otherwise,
we have P({a}) = 0 for any point a ∈ [0, 0.5]. On the other hand, if we assume a
very small interval around a point, the probability of that interval can be expressed
as the product of the value of the pdf and the length of the interval. For example, for
a pdf f with f (3) = 4 the probability P([3, 3 + d x)) of an arbitrarily small interval
[3, 3 + d x) near 3 is
f (3)d x = 4d x. (2.5.23)
This implies that, as we can obtain the probability of an event by adding the proba-
bility mass over all points in the event in discrete probability spaces, we can obtain
the probability of an event by integrating the probability density over all points in
the event in continuous probability spaces.
Some of the widely-used pdf’s are shown in the examples below.
1
f (r ) = u(r − a)u(b − r ) (2.5.24)
b−a
shown in Fig. 2.9 is called a uniform pdf or a rectangular pdf, and its distribution is
denoted by14 U (a, b). ♦
14 Notations U [a, b], U [a, b), U (a, b], and U (a, b) are all used interchangeably.
132 2 Fundamentals of Probability
1
b−a
a b r
0 λ=1
r
Example 2.5.12 (Romano and Siegel 1986) Countable sets are all of Lebesgue
measure 0. Some uncountable sets such as the Cantor set C described in Example
1.1.46 are also of Lebesgue measure 0. ♦
Example 2.5.13 The pdf
shown in Fig. 2.10 is called an exponential pdf with λ > 0 called the rate of the
pdf. The exponential pdf with λ = 1 is called the standard exponential pdf. The
exponential pdf will be discussed again in Sect. 3.5.4. ♦
λ −λ|r |
f (r ) = e (2.5.26)
2
with λ > 0, shown in Fig. 2.11, is called a Laplace pdf or a double exponential pdf,
and its distribution is denoted by L(λ). ♦
Example 2.5.15 The pdf
1 (r − m)2
f (r ) = √ exp − (2.5.27)
2π σ 2 2σ 2
shown in Fig. 2.12 iscalled a Gaussian pdf or a normal pdf, and its distribution is
denoted by N m, σ 2 . ♦
2.5 Classes of Probability Spaces 133
1
λ=2
0.5
λ=1
0 r
σ = σ1
σ = σ2
m r
When m = 0 and σ 2 = 1, the normal pdf is called the standard normal pdf. The
normal distribution is sometimes called the Gauss-Laplace distribution, de Moivre-
Laplace distribution, or the second Laplace distribution (Lukacs 1970). The normal
pdf will be addressed again in Sect. 3.5.1 and its generalizations into multidimen-
sional spaces in Chap. 5.
Example 2.5.16 For a positive number α and a real number β, the function
α 1
f (r ) = (2.5.28)
π (r − β)2 + α 2
shown in Fig. 2.13 is called a Cauchy pdf and the distribution is denoted by C(β, α).
♦
The Cauchy pdf is also called the Lorentz pdf or Breit-Wigner pdf. We will mostly
consider the case β = 0, with the notation C(α) in this book.
0 r
α = α2
0 r
k = k1
k = k2
0 r
ke−kr
f (r ) = 2 (2.5.30)
1 + e−kr
shown in Fig. 2.16 is called a gamma pdf and the distribution is denoted by G(α, β),
where α > 0 and β > 0. It is clear from (2.5.25) and (2.5.31) that the gamma pdf
with α = 1 is the same as an exponential pdf. ♦
2.5 Classes of Probability Spaces 135
α=1
α=2
0 r
r α−1 (1 − r )β−1
f (r ) = u(r )u(1 − r ) (2.5.32)
B̃(α, β)
shown in Fig. 2.17 is called a beta pdf and the distribution is denoted by B(α, β),
where α > 0 and β > 0. ♦
1
f (r ) = √ u(r )u(1 − r ). (2.5.34)
π r (1 − r )
1 0 −2 cos v sin v
Letting r = cos2 v, we have 0 f (r )dr = π π cos v sin v
dv = 1. The pdf (2.5.34) is
2
also called the inverse sine pdf. ♦
136 2 Fundamentals of Probability
f (r)
(α, β) = (0.7, 0.3)
(α, β) = (1, 3)
(α, β) = (2, 5)
)
,2
(7
)=
(α, 1)
β) (3,
,β
= (2
)=
(α
, 3) ,β
(α
0 1 r
∞
P(A) = ai Pi (A) (2.5.35)
i=1
∞
is also a probability measure on (Ω, F). When some of {Pi }i=1 are discrete while
others are continuous, the probability measure (2.5.35) is called a mixed probability
measure. An important example of the mixed probability measure is the sum
P(A) = λ p(x) + (1 − λ) f (x)d x (2.5.36)
x∈Ad x∈Ac
2.5 Classes of Probability Spaces 137
This example implies that a pdf can be defined also for discrete and mixed spaces by
using impulse functions. ♦
Example 2.5.24 The probability space with the pmf
1
, r = 0; 1
,r = 1;
p(r ) = 2 3 (2.5.38)
1
6
,r = 2; 0, otherwise
1 1 1
f (r ) = δ(r ) + δ(r − 1) + δ(r − 2). (2.5.39)
2 3 6
0+
Here, for example, p(0) = 0− f (x)d x = 21 . ♦
Note that what Example 2.5.24 implies is not that the pmf (2.5.38) is the same as
the pdf (2.5.39), but that a discrete probability space can be expressed in terms of
both a pmf and a pdf. If we have
a+
f (x)d x = p(a) (2.5.40)
a−
for an integer a, then the pdf f (r ) expressed in terms of impulse functions and the
pmf p(r ) represent the same probability space. In other words, to check whether a
pmf p(r ) and a pdf f (r ) represent the same probability space or not, we are required
to check whether the pmf p(r ) and the pdf f (r ) satisfy (2.5.40).
Appendices
Theorem 2.A.1 For a monotonic sequence {Bn }∞ n=1 of events, the probability of the
limit event is equal to the limit of the probabilities of the events in the sequence. In
other words,
138 2 Fundamentals of Probability
P lim Bn = lim P (Bn ) (2.A.1)
n→∞ n→∞
holds true.
n
Proof First, when {Bn }∞
n=1 is a non-decreasing sequence, recollect that ∪ Bi =
i=1
∞
∞
Bn and ∪ Bi = lim Bn . Consider a sequence {Fi }i=1 such that F1 = B1 and
i=1 n→∞
n−1
Fn = Bn − ∪ Bi = Bn ∩ Bn−1
c
for n = 2, 3, . . .. Then, {Fn }∞
n=1 are all mutually
i=1
n n ∞ ∞
exclusive, ∪ Fi = ∪ Bi for any natural number n, and ∪ Fi = ∪ Bi = lim Bn .
i=1
i=1
i=1 i=1 n→∞
∞ ∞ ∞ n
Therefore, P lim Bn = P ∪ Bi = P ∪ Fi = P (Fi ) = lim P (Fi )
n→∞ i=1 i=1 n→∞ i=1
i=1
n n
= lim P ∪ Fi = lim P ∪ Bi , i.e.,
n→∞ i=1 n→∞ i=1
P lim Bn = lim P (Bn ) (2.A.2)
n→∞ n→∞
n
recollecting (2.2.15), Axiom 4 of probability, and ∪ Bi = Bn .
i=1
∞
Next, when {Bn }∞ c
n=1 is a non-increasing sequence, Bn n=1
is a non-decreasing
sequence, and thus, we have
P lim Bnc = lim P Bnc (2.A.3)
n→∞ n→∞
∞ ∞
from (2.A.2). Noting that lim Bnc = ∪ Bic because Bnc n=1
is a non-decreasing
n→∞ i=1
∞
sequence and that ∩ Bi = lim Bn because {Bn }∞ n=1 is a non-increasing sequence,
i=1
n→∞
c c
∞ ∞
we have lim Bnc = ∪ Bic = ∩ Bi = lim Bn . Thus the left-hand side of
n→∞ n→∞
i=1
i=1
(2.A.3) can be written as P lim Bn = 1 − P lim Bn . Meanwhile, the right-
c
n→∞ n→∞
hand side of (2.A.3) can easily be written as lim P Bnc = lim {1 − P (Bn )} =
n→∞ n→∞
1 − lim P (Bn ). Then, (2.A.3) yields (2.A.1). ♠
n→∞
for a non-increasing sequence {Bn }∞n=1 are called the continuity from below and
above of probability, respectively.
Theorem 2.A.1 deals with monotonic, i.e., non-decreasing and non-increasing,
sequences. The same result holds true more generally as we can see in the following
theorem:
holds true.
Proof First, recollect that, among the limit values of a sequence {an }∞ n=1 of real
numbers, the largest and smallest ones are denoted by an and an , respectively. When
an = an , this value is called the limit of the sequence and denoted by lim an . Now,
∞ n→∞
∞
noting that ∪ Bk is a non-increasing sequence, we have P lim sup Bn =
k=n
n=1 n→∞
∞ ∞
P ∩ ∪ Bk , i.e.,
n=1 k=n
∞
P lim sup Bn = lim P ∪ Bk (2.A.7)
n→∞ n→∞ k=n
∞ ∞
because P (Bn ) ≤ P ∪ Bk from Bn ⊆ ∪ Bk . From (2.A.7) and (2.A.8), we get
k=n k=n
P (Bn ) ≤ P lim sup Bn . (2.A.9)
n→∞
Similarly, we get
140 2 Fundamentals of Probability
∞ ∞
P lim inf Bn = P ∪ ∩ Bk
n→∞ n=1 k=n
∞
= lim P ∩ Bk
n→∞ k=n
≤ P (Bn ) (2.A.10)
∞
∞
for the non-decreasing sequence ∩ Bk . The last line
k=n n=1
∞
lim P ∩ Bk ≤ P (Bn ) (2.A.11)
n→∞ k=n
∞ ∞
of (2.A.10) is due to P ∩ Bk ≤ P (Bn ) from ∩ Bk ⊆ Bn . Now, (2.A.9) and
k=n k=n
(2.A.10) produces
P (Bn ) ≤ P lim sup Bn
n→∞
= P lim Bn
n→∞
= P lim inf Bn
n→∞
≤ P (Bn ) (2.A.12)
P lim Bn = P (Bn )
n→∞
= P (Bn )
= lim P (Bn ) (2.A.13)
n→∞
Let us discuss the Borel-Cantelli lemma, which deals with the probability of upper
bound events.
Theorem 2.A.3 (Rohatgi and Saleh 2001) When the sum of the probabilities
∞
{P (Bn )}∞ ∞
n=1 of a sequence {Bn }n=1 of events is finite, i.e., when P (Bn ) < ∞,
n=1
the probability P Bn of the upper bound of {Bn }∞ n=1 is 0.
n−1
∞
∞
∞
Proof First, from P (Bk ) = lim P (Bk ) + P (Bk ) = P (Bk ) +
k=1 n→∞ k=1 k=n k=1
∞
lim P (Bk ), we get
n→∞ k=n
∞
lim P (Bk ) = 0. (2.A.15)
n→∞
k=n
Now using (2.A.7) and the Boole inequality (2.3.2), we get P lim sup Bn =
n→∞
∞ ∞
lim P ∪ Bk ≤ lim P (Bk ), i.e.,
n→∞ k=n n→∞ k=n
P lim sup Bn = 0 (2.A.16)
n→∞
from (2.A.15). ♠
Theorem 2.A.4 When {Bn }∞
n=1 is a sequence of independent events and the sum
∞
∞
P (Bn ) is infinite, i.e., P (Bn ) → ∞, the probability P Bn of the upper bound
n=1 n=1
of {Bn }∞
n=1 is 1.
∞
Proof First, note that P lim sup Bn = lim P ∪ Bi , i.e.,
n→∞ n→∞ i=n
∞
P lim sup Bn = lim 1 − P ∩ Bi
c
(2.A.17)
n→∞ n→∞ i=n
∞
∞
as in the proof of Theorem 2.A.3. Next, if P (Bk ) → ∞, then P (Bk ) →
k=1 k=n
n−1
∞
n−1
∞ because P (Bk ) ≤ n − 1 for any number n and P (Bk ) = P (Bk ) +
k=1 k=1 k=1
142 2 Fundamentals of Probability
∞ ∞
∞ ∞
P (Bk ). Therefore, we get P ∩ Bic = P Bic = {1 − P (Bi )} recollect-
k=n i=n i=n i=n
∞ ∞
ing that {Bi }i=1 are independent of each other, and thus, Bic i=1 are independent of
each other. Finally, noting that 1 − x ≤ e−x for x ≥ 0, we get
∞
∞
P ∩ Bic ≤ exp {−P (Bi )}
i=n
i=n
∞
= exp − P (Bi )
i=n
= 0, (2.A.18)
The notion of length, area, volume, and weight that we encounter in our daily lives
are examples of measure. The length of a rod, the area of a house, the volume of a ball,
and the weight of a package assign numbers to objects. They also assign numbers to
groups of objects.
Appendices 143
A measure is a set function assigning a number to a set. Nonetheless, not all set
functions are measures. A measure should satisfy some conditions. For example, if
we consider the measure of weight, the weight of a bottle filled with water is the sum
of the weight of the bottle and that of the water. In other words, the measure of the
union of sets is equal to the sum of the measures of the sets for mutually exclusive
sets.
Definition 2.A.2 (measure) A non-negative additive function μ with the domain a
σ -algebra is called a measure.
Here, an additive function is a function such that the value of the function for a
countable union of sets is the same as the sum of the values of the function for the
sets when the sets are mutually exclusive. In other words, a function μ satisfying
∞
∞
μ ∪ Ai = μ ( Ai ) (2.A.19)
i=1
i=1
∞
for countable mutually exclusive sets {Ai }i=1 in a σ -algebra is called an additive
function.
Example 2.A.3 Consider a finite set Ω and the collection F = 2Ω . Then, the num-
ber μ(A) of elements of A ∈ F is a measure. ♦
Theorem 2.A.5 For a measure μ on a σ -algebra F, let {An ∈ F}∞
n=1 and A1 ⊆
∞
A2 ⊆ · · · . Then, A = ∪ An ∈ F and 15 lim μ ( An ) = μ(A).
n=1 n→∞
∞
Proof First, because F is a σ -algebra, A = ∪ An is an element of F. Next,
n=1
let B1 = A1 and Bn = An − An−1 for n = 2, 3, . . .. Then, {Bn }∞n=1 are
mutually
n ∞ n
exclusive, An = ∪ Bi , and A = ∪ Bn . Thus, lim μ ( An ) = lim μ ∪ Bi =
n→∞ n→∞
i=1 n=1
i=1
n ∞ ∞
∞
lim μ (Bi ) = μ (Bi ) and μ(A) = μ ∪ Bi = μ (Bi ) from (2.A.19) and,
n→∞ i=1 i=1 i=1 i=1
consequently, lim μ ( An ) = μ(A). ♠
n→∞
The measure for a subset A of Ω can be defined as μ(A) = μω in an abstract
ω∈A
space Ω by first choosing arbitrarily a non-negative number μω for ω ∈ Ω when Ω
is a countable set.
For an abstract space Ω = {3, 4, 5}, let μω = 5 − ω for ω ∈ Ω.
Example 2.A.4
Then, μ(A) = μω is a measure. We have μ({3}) = 2, μ({4}) = 1, μ({5}) =
ω∈A
∞ ∞
15 Note that lim An = ∪ An and lim An = ∩ An when A1 ⊆ A2 ⊆ · · · and A1 ⊇ A2 ⊇ · · · ,
n→∞ n=1 n→∞ n=1
respectively, as discussed in (1.5.8) and (1.5.9).
144 2 Fundamentals of Probability
0, μ({3, 4}) = μ({3}) + μ({4}) = 3, μ({3, 5}) = μ({3}) + μ({5}) = 2, μ({4, 5}) =
μ({4}) + μ({5}) = 1, and μ({3, 4, 5}) = μ({3}) + μ({4}) + μ({5}) = μ({3, 4}) +
μ({5}) = μ({4, 5}) + μ({3}) = μ({3, 5}) + μ({4}) = 3. ♦
Example 2.A.5 Examples of an elementary set and a non-elementary set are shown
in Fig. 2.18. ♦
∞
of μ ( Ai ) over all the coverings of E is called the outer measure of E.
i=1
In general, we have
∞
∗
μ (E) ≤ μ∗ (Bi ) (2.A.21)
i=1
∞
when E = ∪ Bi , and
i=1
Example 2.A.6 Assume the sets shown in Fig. 2.18. Let the measure of the two-
dimensional interval
Appendices 145
x2 x2
(2, 2) (2, 2)
1
(1, 1)
2 x1 2 x1
(1) (2)
Fig. 2.18 Examples of an elementary set (1) and a non-elementary set (2) in two-dimensional space
be μ(Aa,b,c,d ) = (b − a)(d − c). Let the set in Fig. 2.18 (1) be B1 . Then, we
have B1 ⊆ A0,2,0,2 , B1 ⊆ A0,2,0,1 ∪ A1,2,0,2 , and B1 ⊆ A0,2,0,1 ∪ A1,2,1,2 , among
which the covering with the smallest measure is A0,2,0,1 , A1,2,1,2 . Thus, the outer
measure of B1 is μ∗ (B1 ) = 2 + 1 = 3. Similarly, let the set in Fig. 2.18 (2) be
B2 . Then, we have B2 ⊆ A0,2,0,2 , B2 ⊆ A0,2,0,1 ∪ A0,2,1,2 , B2 ⊆
A0,2,0,1 ∪ A1,2,1,2
, n
. . ., among which the covering with the smallest measure is
A 2(i−1) ,2, 2(i−1) , 2i
n n n i=1
n
∗
as n → ∞. Thus, the outer measure of B2 is μ (B2 ) = 4 lim 1− n n = i−1 1
n→∞ i=1
1
4 0 (1 − x)d x = 2. ♦
The collections of all finitely μ-measurable sets and μ-measurable sets are denoted
by M F (μ) and M(μ), respectively.
Theorem 2.A.6 The collection M(μ) is a σ -algebra, and the outer measure μ∗ is
an additive set function on M(μ).
Proof Instead of a rigorous proof, we will simply discuss a brief outline. Assume two
∞ ∞
sequences {Ai }i=1 and {Bi }i=1 of elementary sets converging to A and B, respec-
tively, when A and B are elements of M F (μ). If we let d(A, B) = μ∗ (AB),
then we can show that M F (μ) is an algebra by showing that A ∪ B and A∩B
are included in M F (μ)
based on d A i ∪
A j , Bi ∪ B j ≤ d (A i , Bi ) + d Aj, Bj ,
d Ai ∩ A j , Bi ∩ B j ≤ d (Ai , Bi ) + d A j , B j , and |μ∗ (A) − μ∗ (B)| ≤ d(A, B).
146 2 Fundamentals of Probability
n
μ(A) = m (Ii ) (2.A.25)
i=1
n
for A = ∪ Ii , where {Ii }i=1
n
are non-overlapping intervals and
i=1
p
m(I ) = (bk − ak ) (2.A.26)
k=1
with I = x = x1 , x2 , . . . , x p : ak ≤ xk ≤ bk , k = 1, 2, . . . , p an interval in R p .
Definition 2.A.7 is based on the fact that any elementary set can be obtained from
a union of non-overlapping intervals {Ii }i=1n
. An open set can be obtained from a
countable union of open intervals and is a μ-measurable set. Similarly, a closed set
is the complement of an open set and is also a μ-measurable set because M(μ) is
a σ -algebra. As discussed in Definition 2.2.7, the collection of all Borel sets is a σ -
algebra and is called the Borel σ -algebra or Borel field. In addition, a μ-measurable
set can always be expressed as the union of a Borel set and a set which is of measure 0
and is mutually exclusive of the Borel set. Under the Lebesgue measure, all countable
sets and some16 uncountable sets are of measure 0.
Example 2.A.7 In the one-dimensional space, the Lebesgue measure of an interval
[a, b] is the length μ ([a, b]) = b − a of the interval. The Lebesgue measure of the
set Q of rational numbers is μ(Q) = 0. ♦
Example 2.A.8 In the space X = R p , we have the Lebesgue measure and the col-
lection M of all sets measurable by the Lebesgue measure. Then, it is easy to see
that X is a measure space. ♦
Example 2.A.9 In the space X = J+ , let the number of elements in a set be the
measure μ of the set and let the collection of all subsets of X be M. Then, (X, M, μ)
is a measurable space. ♦
Definition 2.A.9 (measurable function) When the set {x : f (x) > a} is always a
measurable set, a real function f defined on a measurable space is called a measurable
function.
1, x ∈ E,
K E (x) = (2.A.27)
0, x ∈
/E
Theorem 2.A.7 There exists a sequence { f n }∞n=1 of simple functions such that
lim f n (x) = f (x) for any real function f defined on a measurable space. If f
n→∞
is a measurable function, then { f n }∞
n=1 can be chosen as a sequence of measurable
functions, and if f ≥ 0, { f n }∞
n=1 can be chosen to increase monotonically.
n
n2
i −1
f n (x) = K Bn,i (x) + n K Fn (x). (2.A.28)
i=1
2n
Assume the open unit interval J = (0, 1) and the set Q of rational numbers in the
real space R. Consider the translation operator Tt : R → R such that Tt (x) = x + t
for x ∈ R. Suppose the countable set Γt = Tt Q, i.e.,
Γt = {t + q : q ∈ Q}. (2.A.31)
Γt ∩ J = ∅ (2.A.32)
because we can always find a rational number q such that 0 < t + q < 1 for
any real number t. We have Γt = {t + q : q ∈ Q} = {s + (t − s) + q : q ∈ Q} =
s + q : q ∈ Q = Γs and Γt ∩ Γs = ∅ when t − s is a rational number and an
irrational number, respectively. Based on this observation, consider the collection
Definition 2.A.12 (Vitali set) Based on the axiom of choice17 and (2.A.32), we can
obtain an uncountable set
V0 = {x : x ∈ Γt ∩ J, Γt ∈ K} , (2.A.34)
where x represents a number in the interval (0, 1) and an element of Γt ∈ K. The set
V0 is called the Vitali set.
Note that the points in the Vitali set V0 are all in interval (0, 1) and have a one-to-
one correspondence with the sets in K. Denoting the enumeration of all the rational
∞
numbers in the interval (−1, 1) by {αi }i=1 , we get the following theorem:
Theorem 2.A.8 For the Vitali set V0 ,
∞
(0, 1) ⊆ ∪ Tαi V0 ⊆ (−1, 2) (2.A.35)
i=1
holds true.
Proof First, −1 < αi + x < 2 because −1 < αi < 1 and any point x in V0 satisfies
0 < x < 1. In other words, Tαi x ∈ (−1, 2), and therefore
∞
∪ Tαi V0 ⊆ (−1, 2). (2.A.36)
i=1
Next, for any point x in (0, 1), x ∈ Γt with an appropriately chosen t as we have
observed in (2.A.32). Then, we have Γt = Γx and x ∈ Γt = Γx because x − t is a
rational number. Now, denoting a point in Γx ∩ V0 by y, we have y = x + q because
Γx ∩ V0 = ∅ and therefore y − x ∈ Q. Here, y − x is a rational number in (−1, 1)
because 0 < x, y < 1 and, consequently, we can put y − x = αi : in other words,
y = x + αi = Tαi x ∈ Tαi V0 . Thus, we have
∞
(0, 1) ⊆ ∪ Tαi V0 . (2.A.37)
i=1
for i = j.
17 The axiom of choice can be expressed as “For a non-empty set B ⊆ A, there exists a choice
function f : 2 A → A such that f (B) ∈ B for any set A.” The axiom of choice can be phrased in
various expressions, and that in Definition 2.A.12 is based on “If we assume a partition P S of S
composed only of non-empty sets, then there exists a set B for which the intersection with any set
in P S is a singleton set.”
Appendices 151
∞
Proof We prove the theorem by contradiction. Assume that the sets Tαi V0 i=1 are
measurable. Then, from the translation invariance18 of a measure,
they have the same
measure. Denoting the Lebesgue measure of Tαi V0 by μ Tαi V0 = β, we have
∞
μ((0, 1)) ≤ μ ∪ Tαi V0 ≤ μ((−1, 2)) (2.A.39)
i=1
from
(2.A.35).
Here, μ((0, 1)) = 1 and μ((−1, 2)) = 3. In addition, we have
∞
∞
μ ∪ Tαi V0 = μ Tαi V0 , i.e.,
i=1 i=1
∞
∞
μ ∪ Tαi V0 = β (2.A.40)
i=1
i=1
∞
because Tαi V0 i=1 is a collection of mutually exclusive sets as we have observed
in (2.A.38). Combining (2.A.39) and (2.A.40) leads us to
∞
1 ≤ β ≤ 3, (2.A.41)
i=1
Exercises
Exercise 2.1 Obtain the algebra generated from the collection C = {{a}, {b}} of the
set S = {a, b, c, d}.
Exercise 2.2 Obtain the σ -algebra generated from the collection C = {{a}, {b}} of
the set S = {a, b, c, d}.
18 For any real number x, the measure of A = {a} is the same as that of A + x = {a + x}.
152 2 Fundamentals of Probability
Exercise 2.3 Obtain the sample space S in the following random experiments:
(1) An experiment measuring the lifetime of a battery.
(2) An experiment in which an integer n is selected in the interval [0, 2] and then
an integer m is selected in the interval [0, n].
(3) An experiment of checking the color of, and the number written on, a ball selected
randomly from a box containing two red, one green, and two blue balls denoted
by 1, 2, . . . , 5, respectively.
Exercise 2.5 Consider rolling a fair die. For A = {1}, B = {2, 4}, and C =
{1, 3, 5, 6}, obtain P(A ∪ B), P(A ∪ C), and P(A ∪ B ∪ C).
Exercise 2.6 Consider the events A = (−∞, r ] and B = (−∞, s] with r ≤ s in the
sample space of real numbers.
(1) Express C = (r, s] in terms of A and B.
(2) Show that B = A ∪ C and A ∩ C = ∅.
Exercise 2.7 When ten distinct red and ten distinct black balls are randomly arranged
into a single line, find the probability that red and black balls are placed in an
alternating fashion.
Exercise 2.8 Consider two branches between two nodes in a circuit. One of the two
branches is a resistor and the other is a series connection of two resistors. Obtain the
probability that the two nodes are disconnected assuming that the probability for a
resistor to be disconnected is p and disconnection in a resistor is not influenced by
the status of other resistors.
Exercise 2.9 Show that Ac and B are independent of each other and that Ac and B c
are independent of each other when A and B are independent of each other.
Exercise 2.10 Assume the sample space S = {1, 2, 3} and event space F = 2 S .
Show that no two events, except S and ∅, are independent of each other for any
probability measure such that P(1) > 0, P(2) > 0, and P(3) > 0.
Exercise 2.12 Among 100 lottery tickets sold each week, one is a winning ticket.
When a ticket costs 10 euros and we have 500 euros, does buying 50 tickets in one
week bring us a higher probability of getting the winning ticket than buying one
ticket over 50 weeks?
Exercise 2.13 In rolling a fair die twice, find the probability that the sum of the two
outcomes is 7 when we have 3 from the first rolling.
Exercises 153
Exercise 2.14 When a pair of fair dice are rolled once, find P(a − 2b < 0), where
a and b are the face values of the two dice with a ≥ b.
Exercise 2.17 A box contains N balls each marked with a number 1, 2, . . ., and N ,
respectively. Each of N students with identification (ID) numbers 1, 2, . . ., and N ,
respectively, chooses a ball randomly from the box. If the number marked on the ball
and the ID number of the student are the same, then it is called a match.
(1) Find the probability of no match.
(2) Using conditional probability, obtain the probability in (1) again.
(3) Find the probability of k matches.
Exercise 2.18 In the interval [0, 1] on a line of real numbers, two points are chosen
randomly. Find the probability that the distance between the two points is shorter
than 21 .
Exercise 2.19 Consider the probability space composed of the sample space S =
{all pairs (k, m) of natural numbers} and probability measure
Exercise 2.21 Three people shoot at a target. Let the event of a hit by the i-th
person be Ai for i = 1, 2, 3 and assume the three events are independent of each
other. When P (A1 ) = 0.7, P ( A2 ) = 0.9, and P (A3 ) = 0.8, find the probability that
only two people will hit the target.
Exercise 2.23 For three events A, B, and C, show the following results without
using Venn diagrams:
∞
Exercise 2.24 For the sample space S and events E, F, and {Bi }i=1 , show that the
conditional probability satisfies the axioms of probability as follows:
(1) 0 ≤ P(E|F) ≤ 1.
(2) P(S|F)
= 1.
∞ ∞
(3) P ∪ Bi F = ∞
P ( Bi | F) when the events {Bi }i=1 are mutually exclusive.
i=1 i=1
(1) Explain whether or not Ai and A j for i = j are independent of each other.
(2) Obtain a partition of B using {Ai }i=1
n
.
n
(3) Show the total probability theorem P(B) = P (B |Ai ) P ( Ai ).
i=1
P(B|Ak )P(Ak )
(4) Show the Bayes’ theorem P ( Ak | B) =
n .
P(B|Ai )P(Ai )
i=1
Exercise 2.26 Box 1 contains two red and three green balls and Box 2 contains one
red and four green balls. Obtain the probability of selecting a red ball when a ball is
selected from a randomly chosen box.
Exercises 155
Exercise 2.28 A group of people elects one person via rock-paper-scissors. If there
is only one person who wins, then the person is chosen; otherwise, the rock-paper-
scissors is repeated. Assume that the probability of rock, paper, and scissors are 13
for every person and not affected by other people. Obtain the probability pn,k that n
people will elect one person in k trials.
Exercise 2.29 In an election, Candidates A and B will get n and m votes, respec-
tively. When n > m, find the probability that Candidate A will always have more
counts than Candidate B during the ballot-counting.
Exercise 2.30 A type O cell is cultured at time 0. After one hour, the cell will
become
A new type O cell behaves like the first type O cell and a type M cell will disappear in
one hour, where a change is not influenced by any other change. Find the probability
β0 that no type M cell will appear until n + 21 hours from the starting time.
Exercise 2.31 Find the probability of the event A that 5 or 6 appears k times when
a fair die is rolled n times.
Exercise 2.32 Consider a communication channel for signals of binary digits (bits)
0 and 1. Due to the influence of noise, two types of errors can occur as shown in
Fig. 2.19: specifically, 0 and 1 can be identified to be 1 and 0, respectively. Let the
transmitted and received bits be X and Y , respectively. Assume a priori probability
of P (X = 1) = p for 1 and P (X = 0) = 1 − p for 0, and the effect of noise on a
bit is not influenced by that on other bits. Denote the probability that the received bit
is i when the transmitted bit is i by P ( Y = i| X = i) = pii for i = 0, 1.
p01
p10
1 1
p11 = 1 − p10
156 2 Fundamentals of Probability
(1) Obtain the probabilities p10 = P(Y = 0|X = 1) and p01 = P(Y = 1|X = 0)
that an error occurs when bits 1 and 0 are transmitted, respectively.
(2) Obtain the probability that an error occurs.
(3) Obtain the probabilities P(Y = 1) and P(Y = 0) that the received bit is identified
to be 1 and 0, respectively.
(4) Obtain all a posteriori probabilities P(X = j|Y = k) for j = 0, 1 and k = 0, 1.
(5) When p = 0.5, obtain P(X = 1|Y = 0), P(X = 1|Y = 1), P(Y = 1), and P(Y =
0) for a symmetric channel with p00 = p11 .
Exercise 2.33 Assume a pile of n integrated circuits (ICs), among which m are
defective ones. When an IC is chosen randomly from the pile, the probability that
the IC is defective is α1 = mn as shown in Example 2.3.8.
(1) Assume we pick one IC and then one more IC without replacing the first one
back to the pile. Obtain the probabilities α1,1 , α0,1 , α1,0 , and α0,0 that both are
defective, the first one is not defective and the second one is defective, the first
one is defective and the second one is not defective, and neither the first nor the
second one is defective, respectively.
(2) Now assume we pick one IC and then one more IC after replacing the first one
back to the pile. Obtain the probabilities α1,1 , α0,1 , α1,0 , and α0,0 again.
(3) Assume we pick two ICs randomly from the pile. Obtain the probabilities β0 , β1 ,
and β2 that neither is defective, one is defective and the other is not defective,
and both are defective, respectively.
Exercise 2.34 Box 1 contains two old and three new erasers and Box 2 contains one
old and six new erasers. We perform the experiment “choose one box randomly and
pick an eraser at random” twice, during which we discard the first eraser picked.
(1) Obtain the probabilities P2 , P1 , and P0 that both erasers are old, one is old and
the other is new, and both erasers are new, respectively.
(2) When both erasers are old, obtain the probability P3 that one is from Box 1 and
the other is from Box 2.
Exercise 2.35 The probability for a couple to have k children is αp k with 0 < p < 1.
(1) The color of the eyes being brown for a child is of probability b and is independent
of that of other children. Obtain the probability that the couple has r children
with brown eyes.
(2) Assuming that a child being a girl or a boy is of probability 21 , obtain the proba-
bility that the couple has r boys.
(3) Assuming that a child being a girl or a boy is of probability 21 , obtain the prob-
ability that the couple has at least two boys when the couple has at least one
boy.
λx −λ
lim p(x) = e , (2.E.4)
r →∞ x!
r
which implies lim NB r, r +λ = P(λ).
r →∞
Exercise 2.37 A person plans to buy a car of price N units. The person has k units
and wishes to earn the remaining from a game. In the game, the person wins and
loses 1 unit when the outcome is a head and a tail, respectively, from a toss of a coin
with probability p for a head and q = 1 − p for a tail. Assuming 0 < k < N and
the person continues the game until the person earns enough for the car or loses all
the money, find the probability that the person loses all the money. This problem is
called the gambler’s ruin problem.
Exercise 2.38 A large number of bundles, each with 25 tulip bulbs, are contained in
a large box. The bundles are of type R5 and R15 with portions 43 and 41 , respectively.
A type R5 bundle contains five red and twenty white bulbs and a type R15 bundle
contains fifteen red and ten white bulbs. A bulb, chosen randomly from a bundle
selected at random from the box, is planted.
(1) Obtain the probability p1 that a red tulip blossoms.
(2) Obtain the probability p2 that a white tulip blossoms.
(3) When a red tulip blossoms, obtain the conditional probability that the bulb is
from a type R15 bundle.
Exercise 2.39 For a probability space with the sample space Ω = J0 = {0, 1, . . .}
and pmf
5c2 + c, x = 0; 3 − 13c, x = 1;
p(x) = (2.E.5)
c, x = 2; 0, otherwise;
for x > 0, where φ(x) denotes the standard normal pdf, i.e., (2.5.27) with m = 0
and σ 2 = 1, and
∞ 2
1 t
Q(x) = √ exp − dt. (2.E.7)
2π x 2
Exercise 2.41 Balls with colors C1, C2, . . ., Cn are contained in k boxes. Let the
probability of choosing Box B j be P B j = b j and that of choosing a ball with color
n k
k
Ci from Box B j be P (Ci | B j = ci j , where ci j = 1 and b j = 1. A box
i=1 j=1 j=1
is chosen first and then a ball is chosen from the box.
158 2 Fundamentals of Probability
Exercise 2.42 Boxes 1, 2, and 3 contain four red and five green balls, one red
and one green balls, and one red and two green balls, respectively. Assume that
the probabilities of the event Bi of choosing Box i are P (B1 ) = P (B3 ) = 41 and
P (B2 ) = 21 . After a box is selected, a ball is chosen randomly from the box. Denote
the events that the ball is red and green by R and G, respectively.
(1) Are the events B1 and R independent of each other? Are the events B1 and G
independent of each other?
(2) Are the events B2 and R independent of each other? Are the events B3 and G
independent of each other?
4
Exercise 2.43 For the sample space Ω = {1, 2, 3, 4} with P(i) = 41 i=1 , consider
A1 = {1, 3, 4}, A2 = {2, 3, 4}, and A3 = {3}. Are the three events A1 , A2 , and A3
independent of each other?
Exercise 2.44 Consider two consecutive experiments with possible outcomes A and
B for the first experiment and C and D for the second experiment. When P ( AC) = 13 ,
P (AD) = 16 , P (BC) = 16 , and P (B D) = 13 , are A and C independent of each other?
Exercise 2.45 Two people make an appointment to meet between 10 and 11 o’clock.
Find the probability that they can meet assuming that each person arrives at the
meeting place between 10 and 11 o’clock independently and waits only up to 10
minutes.
Exercise 2.46 Consider two children. Assume any child can be a girl or a boy
equally likely. Find the probability p1 that both are boys when the elder is a boy and
the probability p2 that both are boys when at least one is a boy.
Exercise 2.47 There are three red and two green balls in Box 1, and four red and
three green balls in Box 2. A ball is randomly chosen from Box 1 and put into Box
2. Then, a ball is picked from Box 2. Find the probability that the ball picked from
Box 2 is red.
Exercise 2.48 Three people A, B, and C toss a coin each. The person whose outcome
is different from those of the other two wins. If the three outcomes are the same, then
the toss is repeated.
(1) Show that the game is fair, i.e., the probability of winning is the same for each
of the three people.
Exercises 159
(2) Find the probabilities that B wins exactly eight times and at least eight times
when the coins are tossed ten times, not counting the number of no winner.
Exercise 2.49 A game called mighty can be played by three, four, or five players.
When it is played with five players, 53 cards are used by adding one joker to a deck
of 52 cards. Among the 53 cards, the ace of spades is called the mighty, except when
the suit of spades19 is declared the royal suit. In the play, ten cards are distributed to
each of the five players {G i }i=1
5
and the remaining three cards are left on the table,
face side down. Assume that what Player G 1 murmurs is always true and consider
the two cases (A) Player G 1 murmurs “Oh! I do not have the joker.” and (B) Player
G 1 murmurs “Oh! I have neither the mighty nor the joker.” For convenience, let the
three cards on the table be Player G 6 . Obtain the following probabilities and thereby
confirm Table 2.2:
(1) Player G i has the joker.
(2) Player G i has the mighty.
(3) Player G i has either the mighty or the joker.
19When the suit of spades is declared the royal suit, the ace of diamonds, not the ace of spades,
becomes the mighty.
160 2 Fundamentals of Probability
(4) Player G i has at least one of the mighty and the joker.
(5) Player G i has both the mighty and the joker.
Exercise 2.50 In a group of 30 men and 20 women, 40% of men and 60% of women
play piano. When a person in the group plays piano, find the probability that the
person is a man.
Exercise 2.51 The probability that a car, a truck, and a bus passes through a toll gate
is 0.5, 0.3, and 0.2, respectively. Find the probability that 30 cars, 15 trucks, and 5
buses has passed when 50 automobiles have passed the toll gate.
References
N. Balakrishnan, Handbook of the Logistic Distribution (Marcel Dekker, New York, 1992)
P.J. Bickel, K.A. Doksum, Mathematical Statistics (Holden-Day, San Francisco, 1977)
H.A. David, H.N. Nagaraja, Order Statistics, 3rd edn. (Wiley, New York, 2003)
R.M. Gray, L.D. Davisson, An Introduction to Statistical Signal Processing (Cambridge University
Press, Cambridge, 2010)
A. Gut, An Intermediate Course in Probability (Springer, New York, 1995)
C.W. Helstrom, Probability and Stochastic Processes for Engineers, 2nd edn. (Prentice-Hall, Engle-
wood Cliffs, 1991)
S. Kim, Mathematical Statistics (in Korean) (Freedom Academy, Paju, 2010)
A. Leon-Garcia, Probability, Statistics, and Random Processes for Electrical Engineering, 3rd edn.
(Prentice Hall, New York, 2008)
M. Loeve, Probability Theory, 4th edn. (Springer, New York, 1977)
E. Lukacs, Characteristic Functions, 2nd edn. (Griffin, London, 1970)
T.M. Mills, Problems in Probability (World Scientific, Singapore, 2001)
M.M. Rao, Measure Theory and Integration, 2nd edn. (Marcel Dekker, New York, 2004)
V.K. Rohatgi, A.KMd.E. Saleh, An Introduction to Probability and Statistics, 2nd edn. (Wiley, New
York, 2001)
J.P. Romano, A.F. Siegel, Counterexamples in Probability and Statistics (Chapman and Hall, New
York, 1986)
S.M. Ross, A First Course in Probability (Macmillan, New York, 1976)
S.M. Ross, Stochastic Processes, 2nd edn. (Wiley, New York, 1996)
A.N. Shiryaev, Probability, 2nd edn. (Springer, New York, 1996)
A.A. Sveshnikov (ed.), Problems in Probability Theory (Mathematical Statistics and Theory of
Random Functions, Dover, New York, 1968)
J.B. Thomas, Introduction to Probability (Springer, New York, 1986)
P. Weirich, Conditional probabilities and probabilities given knowledge of a condition. Philos. Sci.
50(1), 82–95 (1983)
C.K. Wong, A note on mutually independent events. Am. Stat. 26(2), 27–28 (1972)
Chapter 3
Random Variables
Based on the description of probability in Chap. 2, let us now introduce and discuss
several topics on random variables: namely, the notions of the cumulative distribution
function, expected values, and moments. We will then discuss conditional distribution
and describe some of the widely-used distributions.
3.1 Distributions
Let us start by introducing the notion of the random variable and its distribution
(Gardner 1990; Leon-Garcia 2008; Papoulis and Pillai 2002). In describing the dis-
tributions of random variables, we adopt the notion of the cumulative distribution
function, which is a useful tool in characterizing the probabilistic properties of ran-
dom variables.
Generally, a random variable is a real function of which the domain is a sample space.
The range of a random variable X : Ω → R on a sample space Ω is S X = {x : x =
X (s), s ∈ Ω} ⊆ R. In fact, a random variable is not a variable but a function: yet, it
is customary to call it a variable. In many cases, a random variable is denoted by an
upper case alphabet such as X , Y , . . ..
Definition 3.1.1 (random variable) For a sample space Ω of the outcomes from a
random experiment, a function X that assigns a real number x = X (ω) to ω ∈ Ω is
called a random variable.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 161
I. Song et al., Probability and Random Variables: Theory and Applications,
https://doi.org/10.1007/978-3-030-97679-8_3
162 3 Random Variables
Definition 3.1.2 (measurable function) Given a probability space (Ω, F, P), a real-
valued function g that maps the sample space Ω onto the real number R is called a
measurable function when the condition
is satisfied.
Example 3.1.1 A real-valued function g for which g −1 (D) is a Borel set for every
open set D is called a Borel function, and is a measurable function. ♦
Example 3.1.2 (Romano and Siegel 1986) For the sample space Ω = {1, 2, 3}
and event space F = {Ω, ∅, {3}, {1, 2}}, assume the function g such that g(1) = 1,
g(2) = 2, and g(3) = 3. Then, g is not a random variable because g −1 ({1}) = {1} ∈
/
F although {1} ∈ B(R). ♦
Example 3.1.3 When the outcome from a rolling of a fair die is n, let X 1 (n) = n
and
0, n is an odd number,
X 2 (n) = (3.1.2)
1, n is an even number.
Example 3.1.4 The random variables L, Θ, and D defined below are all continuous
random variables.
(1) When (x, y) denotes the coordinate of a randomly selected
point Q inside the
unit circle centered atthe
origin O, the length L(Q) = x 2 + y 2 of O Q. The
angle Θ(Q) = tan−1 xy formed by O Q and the positive x-axis.
(2) The difference D(r ) = |r − r̃ | between a randomly chosen real number r and
its rounded integer r̃ . ♦
X (g) = t (3.1.3)
for t ≥ 0. Here, because P(X = 0) > 0 and X is continuous for (0, ∞), X is a hybrid
random variable. ♦
Let X be a random variable defined on the probability space (Ω, F, P). Denote the
range of X by A and denote the inverse image of B by X −1 (B) for B ⊆ A. Then,
we have
P X (B) = P X −1 (B) , (3.1.4)
which implies that the probability of an event is equal to the probability of the
inverse image of the event. Based on (3.1.4) and the probability measure P of the
original probability space (Ω, F, P), we can obtain the probability measure P X of
the probability space induced by the random variable X .
Example 3.1.6 Consider a rolling of a fair die and assume P(ω) = 16 for ω ∈ Ω =
{1, 2, . . . , 6}. Define a random variable X by X (ω) = −1 for ω = 1, X (ω) = −2
for ω = 2, 3, 4, and X (ω) = −3 for ω = 5, 6. Then, we have A = {−3, −2, −1}.
Logically, X −1 ({−3}) = {5, 6}, X −1 ({−2}) = {2, 3, 4}, and X −1 ({−1}) = {1}. Now,
the probability measure of random variable X can be obtained as P X ({−3}) =
164 3 Random Variables
P X −1 ({−3}) = P({5, 6}) = 13 , P X ({−2}) = P({2, 3, 4}) = 21 , and P X ({−1}) =
P({1}) = 16 . ♦
is a subset of Ω and, at the same time, an element of the event space F due to
the definition of a random variable. Based on the set X −1 (B) shown in (3.1.5), the
distribution of the random variable X can be defined as follows:
for B ∈ B(A) represents the probability measure of X and is called the distribution
of the random variable X , where A is the range of X and B(A) is the Borel field of
A.
Ω B R
in terms of the pmf p X . For a continuous random variable X with range A and pdf
f X , we have
P X (B) = f X (x)d x, B ∈ B(A), (3.1.9)
B
which is the counterpart of (3.1.8): note that the counterpart of (3.1.7) does not exist
for a continuous random variable.
The cdf FX (x) denotes the probability that X is located in the half-open interval
(−∞, x]. For example, FX (2) is the probability that X is in the half-open interval
(−∞, 2], i.e., the probability of the event {−∞ < X ≤ 2}.
The pmf and cdf for a discrete random variable and the pdf and cdf for a continuous
random variable can be expressed in terms of each other, as we shall see in (3.1.24),
(3.1.32), and (3.1.33) later. The probabilistic characteristics of a random variable can
be described by the cdf, pdf, or pmf: these three functions are all frequently indicated
as the distribution function, probability distribution function, or probability function.
In some cases, only the cdf is called the distribution function, and probability function
in the strict sense only indicates the probability measure P as mentioned in Sect. 2.2.3.
In some fields such as statistics, the name distribution function is frequently used
while the name cdf is widespread in other fields including engineering.
Example 3.1.7 Let the outcome from a rolling of a fair die be X . Then, we can
obtain the cdf FX (x) = P(X ≤ x) of X as
FX (x) = P(X ≤ x)
⎧
⎨ 1, x ≥ 6,
= 6i , i ≤ x < i + 1, i = 1, 2, 3, 4, 5, (3.1.11)
⎩
0, x < 1,
1
3
0 1 2 3 4 5 6 x
0 1 x
Example 3.1.8 Let the coordinate Y be a number chosen randomly in the interval
[0, 1]. Then, P(Y ≤ x) = 1, x, and 0 when x ≥ 1, 0 ≤ x < 1, and x < 0, respec-
tively. Therefore, the cdf of Y is
⎧
⎨ 1, x ≥ 1,
FY (x) = x, 0 ≤ x < 1, (3.1.12)
⎩
0, x < 0.
Theorem 3.1.1 The cdf is a non-decreasing function: that is, F (x1 ) ≤ F (x2 ) when
x1 < x2 for a cdf F. In addition, we have F(∞) = 1 and F(−∞) = 0.
From the definition of the cdf and probability measure, it is clear that
0 xD x
for a discrete or a hybrid random variable as shown in Fig. 3.4. On the other hand,
the probability of one point is 0 for a continuous random variable: in other words,
we have
P(X = x) = 0 (3.1.16)
and
FX (x) − FX x − = 0 (3.1.17)
Theorem 3.1.2 The cdf is continuous from the right. That is,
for a cdf FX .
∞
Proof Consider a sequence {αi }i=1 such that αi+1 ≤ αi and lim αi = 0. Then,
i→∞
Now, we have lim P (X ∈ (x, x + αi ]) = lim P X ((x, x + αi ]) = P X lim x,
i→∞ i→∞ i→∞
∞
x + αi from (2.A.1) because {(x, x + αi ]}i=1 is a monotonic sequence. Subse-
∞
quently, we have P X lim {(x, x + αi ]} = P X ∩ (x, x + αi ] = P X (∅) from
i→∞ i=1
∞
(1.5.9) and ∩ (x, x + αi ] = ∅ as shown, for instance, in Example 1.5.9. In other
i=1
words,
168 3 Random Variables
Example 3.1.9 (Loeve 1977) Let the probability measure and corresponding cdf be
P and F, respectively. When g is an integrable function,
g dP or g dF (3.1.21)
is called the Lebesgue-Stieltjes integral and is often written as, for instance,
b
g dP = g d F. (3.1.22)
[a,b) a
When F(x) = x for x ∈ [0, 1], the measure P is called the Lebesgue measure as
mentioned in Definition 2.A.7, and
b
g dx = g dx (3.1.23)
[a,b) a
As we have already seen in Examples 3.1.7 and 3.1.8, subscripts are used to
distinguish the cdf’s of several random variables as in FX and FY . In addition, when
the cdf FX and pdf f X is for the random variable X with the distribution P X , it is
denoted by X ∼ P X , X ∼ FX , or X ∼ f X . For example, X ∼ P(λ) means that the
random variable X follows the Poisson distribution with parameter λ, X ∼ U [a, b)
means that the distribution of the random variable X is the uniform distribution over
[a, b), and Y ∼ f Y (t) = e−t u(t) means that the pdf of the random variable Y is
f Y (t) = e−t u(t).
Theorem 3.1.3 A cdf may have at most countably many jump discontinuities.
Theorem 3.1.3 is a special case of the more general result that a function which is
continuous from the right-hand side or left-hand side at all points and a monotonic
real function may have, at most, countably many jump discontinuities.
Based on the properties of the cdf, we can now redefine the continuous, discrete,
and hybrid random variables as follows:
Here, when a function is increasing only at some points and is constant in a closed
interval not containing the points, the function is called a step-like function. The cdf
shown in Fig. 3.4 is an example of a hybrid random variable which is not continuous
at a point x D .
x x
FX (x) = α p X (k) + (1 − α) f X (y)dy, (3.1.25)
k=−∞ −∞
which is sufficiently general for us to deal with in this book. Note that, as described
in Appendix 3.1, the most general cdf is a weighted sum of an absolutely continuous
function, a discrete function, and
a singular function.
The probability P X (B) = B d FX (x) of an event B can be obtained as
170 3 Random Variables
⎧
⎨ B f X (x)d x, for a continuous random variable,
⎪
P X (B) = (3.1.26)
⎪
⎩ p X (x), for a discrete random variable.
x∈B
Example 3.1.10 Consider a Rayleigh random variable R. Then,from the pdf f R (x)
x2
x t2
= αx2 exp − 2α 2 u(x), the cdf FR (x) = −∞ αt2 exp − 2α 2 u(t)dt is easily
obtained as
x2
FR (x) = 1 − exp − 2 u(x). (3.1.27)
2α
2
When α = 1, the probability of the event {1 < R < 2} is f R (t)dt = FR (2) −
√ 1
FR (1) = e−1 − e−2 ≈ 0.4712. ♦
∞ −∞ −x
Proof First, P(X > x) = x f X (y)dy = −x f X (−t)(−dt) = −∞ f X (t)dt =
FX (−x) because f X (x) = f X (−x). Recollecting (3.1.13), we get (3.1.28). ♠
−kx
Example 3.1.11 Consider the pdf’s f L (x) = (1+e ke
−kx )2 for k > 0 of the logistic distri-
λ −λ|x|
bution (Balakrishnan 1992) and f D (x) = 2 e for λ > 0 of the double exponential
distribution. The cdf’s of these distributions are
1
FL (x) = (3.1.29)
1 + e−kx
and
1
eλx , x ≤ 0,
FD (x) = 2 (3.1.30)
1 − 21 e−λx , x ≥ 0,
α
−1
for the Cauchy distribution with pdf f C (r ) = π
(r − β)2 + α2 shown
in (2.5.28). ♦
From (3.1.24), we can easily see that the pdf and pmf can be obtained as
d
f X (x) = FX (x)
dx
1
= lim P(x < X ≤ x + ε) (3.1.32)
ε→0 ε
and
from the cdf when X is a continuous random variable and a discrete random variable,
respectively.
For a discrete random variable, a pmf is used normally. Yet, we can also define
the pdf of a discrete random variable using the impulse function as we have observed
in (2.5.37). Specifically, let the cdf and pmf of a
discrete random
variable X be FX
and p X , respectively. Then, based on FX (x) = p X (xi ) = p X (xi ) u (x − xi ),
xi ≤x i
we can regard
d
f X (x) = p X (xi ) u (x − xi )
dx i
= p X (xi ) δ (x − xi ) (3.1.34)
i
as the pdf of X .
Example 3.1.13 For the pdf f (x) = 2x for x ∈ [0, 1] and 0 otherwise, sketch the
cdf.
x
Solution Obtaining the cdf F(x) = −∞ f (t)dt, we get
0, x < 0; x 2 , 0 ≤ x < 1;
F(x) = (3.1.35)
1, x ≥ 1,
1 1 1
f (x) = {u(x) − u(x − 1)} + δ(x − 1) + δ(x − 2), (3.1.36)
2 3 6
f (x) F (x)
2 1
x2
1 x 1 x
Fig. 3.5 The cdf F(x) for the pdf f (x) = 2xu(x)u(1 − x)
f (x) F (x)
1
2 1
5
6
1
3
δ(x − 1) 1
1 2
6
δ(x − 2)
x x
2 1 2 1
Fig. 3.6 The pdf f (x) = 21 {u(x) − u(x − 1)} + 13 δ(x − 1) + 16 δ(x − 2) and cdf F(x)
x
Solution First, we get the cdf F(x) = −∞ f (t)dt as
0, x < 0; x
, 0 ≤ x < 1;
F(x) = 2 (3.1.37)
5
6
, 1 ≤ x < 2; 1, 2 ≤ x,
Example 3.1.15 Let X be the face of a die from a rolling. Then, the cdf of X is
6
FX (x) = 16 u(x − i), from which we get the pdf
i=1
1
6
f X (x) = δ(x − i) (3.1.38)
6 i=1
of X by differentiation. In addition,
1
, i = 1, 2, . . . , 6,
p X (i) = 6 (3.1.39)
0, otherwise
is the pmf of X . ♦
3.1 Distributions 173
Example 3.1.16 The function (2.5.37) addressed in Example 2.5.23 is the pdf of a
hybrid random variable. ♦
Example 3.1.17 A box contains G green and B blue balls. Assume we take one
ball from the box n times without1 replacement. Obtain the pmf of the number X of
green balls among the n balls taken from the box.
Solution We easily get the probability of X = k as
B Cn−k G Ck
P(X = k) = (3.1.40)
G+B Cn
2
Example 3.1.18 For the random variable Z with pdf f Z (z) = √12π exp − z2 ,
1
we have P(|Z | ≤ 1) = −1 f Z (z)dz ≈ 0.6826, P(|Z | ≤ 2) ≈ 0.9544, and P(|Z | ≤
3) ≈ 0.9974. ♦
Using (3.1.42),
∞ the value F(∞) = 1 mentioned in Theorem 3.1.1 can be con-
firmed as −∞ f (x)d x = P(−∞ < X ≤ ∞) = F(∞) = 1. Let us mention that
x− x+
although P (x1 ≤ X < x2 ) = x −2 f X (x)d x, P (x1 ≤ X ≤ x2 ) = x −2 f X (x)d x, and
1 1
x−
P (x1 < X < x2 ) = x +2 f X (x)d x are slightly different from (3.1.42), these four
1
probabilities are all equal to each other unless the pdf f X contains impulse func-
tions at x1 or x2 .
As it is observed, for instance, in Example 3.1.15, considering a continuous ran-
dom variable with the pdf is very similar to considering a discrete random variable
with the pmf. Therefore, we will henceforth focus on discussing a continuous random
variable with the pdf. One final point is that
and
hold true for all the pdf’s f we will discuss in this book.
In this section, when the cdf FX , pdf f X , or pmf p X of a random variable X is known,
we obtain the probability functions of a new random variable Y = g(X ), where g is
a measurable function (Middleton 1960).
where A is the sample space of the random variable X . Using (3.2.1), the pdf or pmf
of Y can be obtained subsequently: specifically, we can obtain the pdf of Y as
d
f Y (v) = FY (v) (3.2.2)
dv
when Y is a discrete random variable. The result (3.2.3) is for a random variable whose
range is a subset of integers as described after Definition 3.1.4: more generally, we
can write it as
the cdf of Y = 2X + 1 is
0, y ≤ 1; y−1
, 1 ≤ y ≤ 3;
FY (y) = 2 (3.2.7)
1, y ≥ 3;
FX (x) FY (y)
1 1
1
2
0 1 x 0 1 2 3 y
Example 3.2.3 For a continuous random variable X with cdf FX , obtain the cdf of
Y = X1 .
Solution We get
FY (y) = P (X (y X − 1) ≥ 0)
⎧
⎪
⎨ P X ≤ 0 or X ≥ y ,
⎪ y > 0,
1
= P (X ≤ 0) , y = 0,
⎪
⎪
⎩P 1 ≤ X ≤ 0 , y<0
y
⎧
⎪
⎨ FX (0) + 1 − FX y ,
⎪ y > 0,
1
= FX (0), y = 0, (3.2.8)
⎪
⎪
⎩ FX (0) − FX 1 , y < 0,
y
by noting that 1
X
≤ y = X ≤ y X 2 = {(y X − 1)X ≥ 0}. ♦
Example 3.2.4 Obtain the cdf of Y = a X 2 in terms of the cdf FX of X when a > 0.
Solution Because the set {Y ≤ y} is equivalent
2 to the
set a X 2 ≤ y , the cdf of Y
can be obtained as FY (y) = P(Y ≤ y) = P X ≤ ay , i.e.,
FY (y) =
0, y < 0, (3.2.9)
P − a ≤ X ≤ ay , y ≥ 0,
y
FX (x) FY (x)
1 1
1 1
3 3
0 x 0 x
√
Fig. 3.8 The cdf FX (x) = 1 − 23 e−x u(x) and cdf FY (x) = 1 − 23 e− x u(x) of Y = X 2
2 −√x
FY (x) = 1 − e u(x), (3.2.11)
3
√
from FY (y) = P(Y ≤ y) = P X≤y . ♦
Example 3.2.7 Recollecting that the probability for a singleton set is 0 for a con-
tinuous random variable X , the cdf of Y = |X | can be obtained as FY (y) = P(Y ≤
y) = P(|X | ≤ y), i.e.,
0, y < 0,
FY (y) =
P(−y ≤ X ≤ y), y ≥ 0
0, y < 0,
=
FX (y) − FX (−y) + P(X = −y), y ≥ 0
= {FX (y) − FX (−y)} u(y) (3.2.13)
in terms of the cdf FX of X . Examples of the cdf FX (x) and FY (y) are shown in
Fig. 3.9. ♦
FX (x) FY (y)
1 1
0 x 0 y
Solution First, when y < −b and y ≥ b, we have FY (y) = 0 and FY (y) = FY (b) =
1, respectively. Next, when −b ≤ y < b, we have FY (y) = FX (y) from FY (y) =
P(Y ≤ y) = P(X ≤ y). Thus, we eventually have
⎧
⎨ 1, y ≥ b,
FY (y) = FX (y), −b ≤ y < b, (3.2.15)
⎩
0, y < −b,
which is continuous from the right-hand side at any point y and discontinuous at
y = ±b in general. ♦
Let us first introduce the following theorem which is quite useful in dealing with the
differentiation of an integrated bi-variate function:
3.2 Functions of Random Variables and Their Distributions 179
Fig. 3.10 The cdf FX (x), limiter g(x), and cdf FY (y) of Y = g(X ) when X ∼ U (−1, 1)
Theorem 3.2.1 Assume that a(x) and b(x) are integrable functions and that both
∂
g(t, x) and ∂x g(t, x) are continuous in x and t. Then, we have
b(x)
d db(x) da(x)
g(t, x)dt = g(b(x), x) − g(a(x), x)
dx a(x) dx dx
b(x)
∂g(t, x)
+ dt, (3.2.18)
a(x) ∂x
because {Y ≤ y} = X ≤ g −1 (y) , where g −1 is the inverse of g. Thus, the pdf of
−1
Y = g(X ) is f Y (y) = dy
d
FY (y) = f X g −1 (y) dg dy(y) , i.e.,
180 3 Random Variables
dx
f Y (y) = f X (x) (3.2.20)
dy
from FY (y) = P(Y ≤ y) = P X ≥ g −1 (y) , and the pdf is f Y (y) = d
dy
FY (y) =
−1
− f X g −1 (y) dg dy(y) , i.e.,
dx
f Y (y) = − f X (x) . (3.2.22)
dy
The result (3.2.23) can be written as f Y (y) = |gX (g−1 (y) ) , as f Y (y) =
f g −1 (y)
( )|
dx
f X (x) dy −1
, or as
x=g (y)
fX (x) fY (y)
1
1
2
0 1 x 1 2 3 y
Fig. 3.11 The pdf f X (x) and pdf f Y (y) of Y = 2X + 1 when X ∼ U [0, 1)
from (3.2.23), which can also be obtained by differentiating (3.2.8). Figure 3.12
shows the pdf f X (x) and pdf f Y (y) of Y = X1 when X ∼ U [0, 1). ♦
fX (x) fY (y)
1 1
0 1 x 0 1 y
f √ X (y) = 2y f X y 2 u(y), (3.2.28)
which is the same as f √ X (y) = 2y f X y 2 u(y) + FX (0)δ(y), obtainable by dif-
ferentiating F√ X (y) = FX y 2 u(y) shown in (3.2.12), except at y = 0. Note
√
that, for X to be meaningful, we should have P(X < 0) = 0. Thus, when
X is a continuous random variable, we have FX (0) = P(X ≤ 0) ∞= P(X = 0) =
0 and, consequently, FX (0)δ(y) = 0. We then easily obtain3 −∞ f √ X (y)dy =
∞ 2 ∞ ∞
0 2y f X y dy = 0 f X (t)dt = −∞ f X (t)dt = 1 because f X (x) = 0 for x < 0
from P(X < 0) = 0. ♦
√
Example 3.2.16 Obtain the pdf of Y = X when the pdf X is
⎧
⎨ x, 0 ≤ x < 1,
f X (x) = 2 − x, 1 ≤ x < 2, (3.2.29)
⎩
0, x < 0 or x ≥ 2.
we get
⎧ 3
⎨ 2y , 0≤y<√ 1,
f Y (y) = 2y 2 − y 2 , 1 ≤ y < 2, √ (3.2.31)
⎩
0, y < 0 or y ≥ 2
∞ ∞ ∞
3We can equivalently obtain −∞ f √ X (y)dy = 0 2y f X y 2 dy + FX (0) = 0 f X (t)dt +
0 ∞ 0
−∞ f X (t)dt = −∞ f X (t)dt = 1 using FX (0) = −∞ f X (t)dt.
3.2 Functions of Random Variables and Their Distributions 183
fX (x) fY (y)
1 2
0 x 0 √ y
1 2 1 2
√
Fig. 3.13 The pdf f Y (y) of Y = X for the pdf f X (x) of X
fX (x) fY (y)
2
exp σ
2
√
2πσ 2
0 x 0 2 y
e−σ
Fig. 3.14 The pdf f X (x) of X ∼ N 0, σ 2 and pdf of the log-normal random variable Y = e X
1
f e X (y) = f X (ln y)u(y), (3.2.32)
y
assuming u(0) = 0. ♦
Example 3.2.18 When X ∼ N m, σ 2 , obtain the distribution of Y = e X .
exp − (x−m)
2
Solution Noting that f X (x) = √2πσ
1
2σ 2 , we get
1 (ln y − m)2
f Y (y) = √ exp − u(y), (3.2.33)
2πσ 2 y 2σ 2
is called the log-normal pdf. Figure 3.14 shows the pdf f X (x) of X ∼
which
N 0, σ 2 and the pdf (3.2.33) of the log-normal random variable Y = e X . ♦
We now extend our discussion into the more general case where the transformation
y = g(x) has multiple solutions.
Theorem 3.2.3 When the solutions to y = g(x) are x1 , x2 , . . ., that is, when y =
g (x1 ) = g (x2 ) = · · · , the pdf of Y = g(X ) is obtained as
∞
f X (xi )
f Y (y) = , (3.2.34)
i=1
|g (xi )|
We now consider some examples for the application of the result (3.2.34).
Example 3.2.19 Obtain the pdf of Y = a X 2 for a > 0 in terms of the pdf f X of X .
Solution If y < 0, then the solution to y = ax 2does not exist. Thus,
f Y (y) = 0. If
2 y y
y > 0, then the solutions to y = ax are x1 = a and x2 = − a . Thus, we have
g (x1 ) = g (x2 ) = 2a y from g (x) = 2ax and, subsequently,
a
1 y y
f aX2 (y) = √ fX + fX − u(y), (3.2.35)
2 ay a a
which is, as expected, the same as the result obtainable by differentiating the cdf
(3.2.10) of Y = a X 2 . ♦
Example 3.2.20 When X ∼ N (0, 1), we can easily obtain the pdf f Y (y) =
4
2
√ 1 exp − y u(y) of Y = X 2 by noting that f X (x) = √1 exp − x . ♦
2π y 2 2π 2
4This pdf is called the central chi-square pdf with the degree of freedom of 1. The central chi-square
pdf, together with the non-central chi-square pdf, is discussed in Sect. 5.4.2.
3.2 Functions of Random Variables and Their Distributions 185
obtained by differentiating the cdf FY (y) in (3.2.13), and then, noting that {FX (y)
−FX (−y)} δ(y) = {FX (0) − FX (0)} δ(y) = 0. ♦
Example 3.2.22 When X ∼ U [−π, π), obtain the pdf and cdf of Y = a sin(X + θ),
where a > 0 and θ are constants.
Solution First, we have f Y (y) = 0 for |y| > a. When |y| < a, letting the two
solutions to y = g(x) = a sin(x + θ) in the interval [−π, π) of x be x1 and x2 ,
we have f X (x1 ) = f X (x2 ) = 2π
1
. Thus, recollecting that g (x) = |a cos(x + θ)| =
a 2 − y 2 , we get
1
f Y (y) = u(a − |y|) (3.2.38)
π a − y2
2
from (3.2.34). Next, let us obtain the cdf FY (y). When 0 ≤ y ≤ a, letting α =
sin−1 ay and 0 ≤ α < π2 , we have x1 = α − θ and x2 = π − α − θ and, conse-
quently, FY (y) = P(Y ≤ y) = P (−π ≤ X ≤ x1 ) + P (x2 ≤ X < π). Now, from
P (−π ≤ X ≤ x1 ) = 2π 1
(x1 + π) and P (x2 ≤ X < π) = 2π 1
(π − x2 ), we have
FY (y) = 2π (2π + 2α − π), i.e.,
1
1 1 y
FY (y) = + sin−1 . (3.2.39)
2 π a
1 1 y
FY (y) = + sin−1 . (3.2.40)
2 π a
The cdf (3.2.41) can of coursebe obtained from the pdf (3.2.38) by integra-
y
tion: specifically, from FY (y) = −∞ πu(a−|t|)
√
a 2 −t 2
dt, we get FY (y) = 0 when y ≤ −a,
sin −1 y
FY (y) = π1 − π a a cos 1
θ
a cos θdθ = π1 sin−1 ay + π2 = 21 + π1 sin−1 ay when −a ≤
2
y ≤ a, and FY (y) = 1
2
+ 1
π
sin−1 1 = 1 when y ≥ a. Figure 3.15 shows the pdf
fY (y)
FY (y)
1
fX (x) 1
2π 1
2
1
2π
−π 0 π x 0 2 y
−2 −2 0 2 y
Fig. 3.15 The pdf f X (x), pdf f Y (y) of Y = 2 sin(X + θ), and cdf FY (y) of Y when X ∼ U [−π, π)
f X (x), pdf f Y (y), and cdf FY (y) when a = 2. Exercise 3.4 discusses a slightly
more general problem. ♦
Example 3.2.23 For a continuous random variable X with cdf FX , obtain the cdf,
pdf, and pmf of Z = sgn(X ), where
is called the sign function. First, we have the cdf FZ (z) = P(Z ≤ z) = P(sgn(X ) ≤
z) as
⎧
⎨ 0, z < −1,
FZ (z) = P(X ≤ 0), −1 ≤ z < 1,
⎩
1, z≥1
= FX (0)u(z + 1) + {1 − FX (0)}u(z − 1), (3.2.43)
as the pmf of Z . ♦
3.2 Functions of Random Variables and Their Distributions 187
Specifically, assume that the cdf FX and the inverse FY−1 of the cdf FY are continuous
and increasing. Letting
Z = FX (X ), (3.2.47)
we have X = FX−1 (Z ) and {FX (X ) ≤ z} = X ≤ FX−1 (z) because FX is continuous
of Z is FZ (z) = P(Z ≤ z) = P (FX (X ) ≤ z) =
6
and
increasing. Therefore,
the cdf
P X ≤ FX−1 (z) = FX FX−1 (z) = z for 0 ≤ z < 1. In other words, we have
Next, consider
V = FY−1 (Z ). (3.2.49)
Then, recollecting (3.2.48), we get the cdf P(V ≤ y) = P FY−1 (Z ) ≤ y =
P (Z ≤ FY (y)) = FZ (FY (y)) = FY (y) of V because FZ (x) = x for x ∈ (0, 1). In
other words, when X ∼ FX , we have V = FY−1 (Z ) = FY−1 (FX (X )) ∼ FY , which is
summarized as the following theorem:
Theorem 3.2.4 The function that transforms a random variable X with cdf FX into
a random variable with cdf FY is g = FY−1 ◦ FX .
Figure 3.16 illustrates some of the interesting results such as
and
6 Here, because FX is a continuous function, FX FX−1 (z) = z as it is discussed in (3.A.26).
188 3 Random Variables
Fig. 3.16 Transformation of a random variable X with cdf FX into Y with cdf FY
Theorem 3.2.4 can be used in the generation of random numbers, for instance.
Example 3.2.24 From X ∼ U [0, 1), obtain the Rayleigh random variable Y ∼
y2
f Y (y) = αy2 exp − 2α 2 u(y).
y2
Solution Because the cdf of Y is FY (y) = 1 − exp − 2α 2 u(y), the func-
−1
tion we are looking for is g(x) = FY (x) = −2α2 ln(1 − x)as we can easily
see from (3.2.51). In other words, ifX ∼ U [0, 1), then Y = −2α2 ln(1 − X )
2
y
has the cdf FY (y) = 1 − exp − 2α 2 u(y). Note that, we conversely have V =
Y2
1 − exp − 2α2 ∼ U (0, 1). ♦
pY (yn ) = P (Y = yn )
pn , n = 1, 2, . . . ,
= (3.2.53)
0, otherwise.
Y −1
Y
pk < X ≤ pk (3.2.54)
k=0 k=0
In this section, we attempt to introduce some of the key notions for use in such
cases. Among the widely employed representative values, also called central values,
for describing the probabilistic characteristics of a random variable and a distribution
are the mean, median, and mode (Beckenbach and Bellam 1965; Bickel and Doksum
1977; Feller 1970; Hajek 1969; McDonough and Whalen 1995).
holds true for all real number x, then the value xmod is called the mode of X .
The mode is the value that could happen most frequently among all the values of a
random variable. In other words, the mode is the most probable value or, equivalently,
the value at which the pmf or pdf of a random variable is maximum.
Roughly speaking, the median is the value at which the cumulative probability
is 0.5. When the distribution is symmetric, the point of symmetry of the cdf is the
median. The median is one of the quantiles of order p, or 100 p percentile, defined as
the number ξ p satisfying P(X ≤ ξ p ) ≥ p and P(X ≥ ξ p ) ≥ 1 − p for 0 < p < 1.
For a random variable X with cdf FX , we have
p ≤ FX ξ p ≤ p + P X = ξ p . (3.3.3)
Therefore, if P X = ξ p = 0 as for a continuous random variable, the solution to
FX (x) = p is ξ p : the solution to this equation is unique when the cdf FX is a strictly
increasing function, but otherwise, there exist many solutions, each of which is the
quantile of order p.
The median and mode are not unique in some cases. When there exist many
medians, the middle value is regarded as the median in some cases.
Example 3.3.1 For the pmf p X (1) = 13 , p X (2) = 21 , and p X (3) = 16 , because
P(X ≤ 2) = 13 + 21 = 56 ≥ 21 and P(X ≥ 2) = 21 + 16 = 23 ≥ 21 , the median7 is 2.
For the uniform distribution over the set {1, 2, 3, 4}, any real number in the interval8
[2, 3] is the median, and the mode is 1, 2, 3, or 4. ♦
7 Note that if the median xmed is defined by P(X ≤ xmed ) = P (X ≥ xmed ), we do not have the
median in this pmf.
med is defined by P(X ≤ x med ) = P (X ≥ x med ), any real number in
8 Note that if the median x
Example 3.3.2 For the distribution N (1, 1), the mode is 1. When the pmf is p X (1)
= 13 , p X (2) = 21 , and p X (3) = 16 , the mode of X is 2. ♦
We now introduce the most widely used representative value, the expected value.
Definition
∞3.3.3 (expected value) For a random variable X with cdf FX , the value
E{X } = −∞ x d FX (x), i.e.,
⎧∞
⎨ −∞ x f X (x)d x, X continuous random variable,
E{X } = ∞
(3.3.4)
⎩ x p X (x), X discrete random variable
x=−∞
∞
is called the expected value or mean of X if −∞ |x| d FX (x) < ∞.
The expected value is also called the stochastic average, statistical average, or ensem-
ble average, and E{X } is also written as E(X ) or E[X ].
b x
Example 3.3.3 For X ∼ U [a, b), we have the expected value E{X } = a b−a dx =
b2 −a 2
= a+b
2(b−a) 2
of X . The mode of X is any real number between a and b, and the
median is the same as the mean a+b
2
. ♦
Example 3.3.4 (Stoyanov 2013) For unimodal random variables, the median usually
lies between the mode and mean: an example of exception is shown here. Assume
the pdf
⎧
⎨ 0, x ≤ 0,
f (x) = x, 0 < x ≤ c, (3.3.5)
⎩ −λ(x−c)
ce , x >c
2
of X with c ≥ 1 and c2 + λc = 1. Then, the mean, median, and mode of X are
3 2
μ = c3 + cλ + λc2 , 1, and c, respectively. If we choose c > 1 sufficiently close to 1,
then λ ≈ 2 and μ ≈ 12 13
, and the median is smaller than the mean and mode although
f (x) is unimodal. ♦
Theorem 3.3.1 (Stoyanov 2013) A necessary condition for the mean E{X } to exist
for a random variable X with cdf F is lim x{1 − F(x)} = 0.
x→∞
∞ x
Proof Rewrite x{1 − F(x)} as x{1 − F(x)} = x f (t)dt − f (t)dt =
∞ −∞ −∞
x x f (t)dt. Now, letting E{X } = m, we have
3.3 Expected Values and Moments 191
x ∞
m= t f (t)dt + t f (t)dt
−∞ x
x ∞
≥ t f (t)dt + x f (t)dt (3.3.6)
−∞ x
∞ ∞ ∞
for x > 0 because x t f (t)dt ≥ x f (t)dt. Here, we should have lim x x
x
x ∞ x→∞
f (t)dt → 0 for (3.3.6) to hold true because lim −∞ t f (t)dt = −∞ t f (t)dt = m
x→∞
when x → ∞. ♠
Based on the result (3.E.2) shown in Exercise 3.1, we can show that (Rohatgi and
Saleh 2001)
∞ 0
E{X } = P(X > x)d x − P(X ≤ x)d x (3.3.7)
0 −∞
for any continuous random variable X , dictating that a necessary and sufficient condi-
∞ 0
tion for E{|X |} < ∞ is that both 0 P(X > x)d x and −∞ P(X ≤ x)d x converge.
Based on the discussions in the previous section, let us now consider the expected
values of functions of a random variable. Let FY be the cdf of Y = g(X ). Then, the
expected value of Y = g(X ) can be expressed as
∞
E{Y } = y d FY (y). (3.3.8)
−∞
In essence, the expected value of Y = g(X ) can be evaluated using (3.3.8) after we
have obtained the cdf, pdf, or pmf of Y from that of X . On the other
∞ hand, the expected
value of Y = g(X ) can be evaluated as E{Y } = E{g(X )} = −∞ g(x)d FX (x), i.e.,
⎧∞
⎪
⎪ g(x) f X (x)d x, continuous random variable,
⎨ −∞
E{Y } =
∞ (3.3.9)
⎪
⎪
⎩ g(x) p X (x), discrete random variable.
x=−∞
While the first approach (3.3.8) of evaluating the expected value of Y = g(X ) requires
that we need to first obtain the cdf, pdf, or pmf of Y from that of X , the second
approach (3.3.9) does not require the cdf, pdf, or pmf of Y . In the second approach,
we simply multiply the pdf f X (x) or pmf p X (x) of X with g(x) and then integrate
or sum without first having to obtain the cdf, pdf, or pmf of Y . In short, if there is
192 3 Random Variables
no other reason to obtain the cdf, pdf, or pmf of Y = g(X ), the second approach is
faster in the evaluation of the expected value of Y = g(X ).
Example 3.3.5 When X ∼ U [0, 1), obtain the expected value of Y = X 2 .
Solution (Method 1) Based
on (3.2.35), we can obtain the pdf f Y (y) =
√ √
√1
2 y
f X y + f X − y u(y) = 1
√
2 y
{u(y) − u(y − 1)} of Y . Next, using
1 y 1√
(3.3.8), we get E{Y } = 0 2√ y dy = 2 0 ydy = 13 .
1
1
(Method 2) Using (3.3.9), we can directly obtain E{Y } = 0 x 2 d x = 13 . ♦
From the definition of the expected value, we can deduce the following properties:
(1) When a random variable X is non-negative, i.e., when P(X ≥ 0) = 1, we have
E{X } ≥ 0.
(2) The expected value of a constant is the constant. In other words, if P(X = c) = 1,
then E{X } = c. n
(3) The expected value is a linear operator: that is, we have E ai gi (X ) =
i=1
n
ai E{gi (X )}.
i=1
(4) For any function h, we have |E{h(X )}| ≤ E{|h(X )|}.
(5) If h 1 (x) ≤ h 2 (x) for every point x, then we have E {h 1 (X )} ≤ E {h 2 (X )}.
(6) For any function h, we have min(h(X )) ≤ E{h(X )} ≤ max(h(X )).
Example 3.3.6 Based on (3) above, we have E{a X + b} = aE{X } + b when a and
b are constants. ♦
Example 3.3.7 For a continuous random variable X ∼ U (1, 9) and h(x) = √1 ,
x
compare h(E{X }), E{h(X )}, min(h(X )), and max(h(X )).
Solution We have h(E{X }) = h(5) = √15 from the result in Example 3.3.3
∞ 9
and E{h(X )} = −∞ h(x) f X (x)d x = 1 8√1 x d x = 21 from (3.3.9). In addition,
min(h(X )) = √19 = 13 and max(h(X )) = √11 = 1. Therefore, 1
3
< 1
2
< 1, i.e.,
min(h(X )) ≤ E{h(X )} ≤ max(h(X )), confirming (6). ♦
∞
the n-th moment of X if E {|X |n } = −∞ |x|n d FX (x) < ∞ for n = 0, 1, . . ..
3.3 Expected Values and Moments 193
In other words, the expected value of a power of a random variable is called the
moment, and the moment is one of the expected values of a function of a random
variable. The n-th moment of X can specifically be written as
⎧∞ n
⎪
⎪ x f X (x)d x, continuous random variable,
⎨ −∞
mn =
∞ (3.3.11)
⎪
⎪
⎩ x n p X (x), discrete random variable.
x=−∞
Definition 3.3.5 (central moment) The expected value μn = E {(X − E{X })n }, i.e.,
⎧∞
⎪
⎪ (x − m 1 )n f X (x)d x, continuous random variable,
⎨ −∞
μn =
∞ (3.3.12)
⎪
⎪
⎩ (x − m 1 )n p X (x), discrete random variable
x=−∞
n
μn = n Ck m k (−m 1 )n−k (3.3.13)
k=0
n
from E (X − m 1 ) n
=E n Ck X
k
(−m 1 ) n−k
, and conversely,
k=0
n
mn = n Ck μk m 1
n−k
(3.3.14)
k=0
n
from m n = E [{(X − m 1 ) + m 1 } ] = E n
n Ck (X − m 1 ) k
m n−k
1 between the
k=0
moments {m n }∞
n=0 and the central moments {μn }∞
n=0 .
Often, we also consider the
absolute moment E {|X |n }.
Some of the moments and functions of moments are used more frequently than
the others in representing the probabilistic properties of a random variable. One such
important parameter is the variance.
a+b
E{X } = (3.3.16)
2
and variance
1
Var{X } = (b − a)2 (3.3.17)
12
b x2
a+b 2
from Var{X } = a b−a dx − 2
. ♦
Example 3.3.9 For the exponential random variable X with pdf f (r ) = λe−λr u(r ),
∞
the mean E{X } = λ 0 r e−λr dr is
1
E{X } = (3.3.18)
λ
i.e.,
E{X } = m, (3.3.20)
3.3 Expected Values and Moments 195
∞ √ √
and the second moment E X 2 = √ 1 2 −∞ 2σ 2 t 2 + 2 2mσt + m 2 2σ
2 ∞ 2 2πσ √
exp −t dt = π −∞ 2σ t exp −t dt + m π , i.e.,
√1 2 2 2
1 √ √
E X 2 = √ σ2 π + m 2 π
π
= σ2 + m 2 . (3.3.21)
Consequently, Var{X } = E X 2 − m 2 = σ 2 . In (3.3.20) and (3.3.21), we have used
⎧√
∞ ⎨ π, k = 0,
t k exp −t 2 dt = 0, k = 1, (3.3.22)
−∞ ⎩ √π
2
, k = 2.
The first and third results in (3.3.22) can be shown easily by recollecting that
the integration of the standard normal pdf over the entire real line is 1, i.e.,
2
√1 exp − x d x = 1 with integration by parts. The second result is based on that
∞2π 22
0 t exp −t dt < ∞ and that t exp −t 2 is an odd function. ♦
E{X } = np (3.3.23)
and variance
σ 2X = np(1 − p) (3.3.24)
∞ −λ k
∞
e−λ λk
Example 3.3.12 We have the mean E{X } = k e k!λ = λ k!
, i.e.,
k=1 k=0
E{X } = λ, (3.3.25)
second moment E X 2 = λ2 + λ, and variance
σ 2X = λ (3.3.26)
and variance do not exist. Similarly, for a random variable with pmf
6
, k = 1, 2, . . . ,
p(k) = π2 k 2 (3.3.27)
0, otherwise,
∞
π
exp −αx 2 d x = , (3.3.28)
−∞ α
∞ √
which can also be obtained from −∞ √απ exp −αx 2 d x = 1. Differentiating
(3.3.28) k times with respect to α using (3.2.18), we get
∞
(2k − 1)!! π
x 2k exp −αx 2 d x = (3.3.29)
−∞ 2 k αk α
∞
for k = 1, 2, . . ., which can also be obtained as −∞ x 2k exp(−αx 2 )d x =
∞ k
2 0 αt e−t 2√dtαt = αk 1√α Γ k + 21 = (2k−1)!!
√ Γ 1 , where
2k αk α 2
We specifically have
E X 4 = 3σ 4 (3.3.32)
and
E X 6 = 15σ 6 . (3.3.33)
In summary, we have
2k k! π2 σ 2k+1 , n = 2k + 1, k = 0, 1, . . . ,
E |X | n
= (3.3.35)
(n − 1)!!σ n , n is even,
with which
2
E {|X |} = σ (3.3.36)
π
and
8 3
E |X | 3
= σ (3.3.37)
π
can be confirmed. ♦
Example 3.3.15 (Romano and Siegel 1986; Stoyanov 2013) When the distribution
is symmetric, i.e., when the cdf F satisfies F(−x) = 1 − F(x), all the odd-ordered
moments are zero. On the other hand, the converse does not necessarily hold true.
For example, consider the pdf
1 1
1
f γ (x) = exp −x 4 1 − γ sin x 4 u(x) (3.3.38)
24
∞
with t = sin−1 − √a 2b+b2 , we can show that 0 x k exp −x 4 sin x 4 d x = 0 for
1 1
∞ xk 1
k = 0, 1, . . .. Thus, for any value of γ, the k-th moment is 0 24 exp −x 4
∞
d x = 16 0 v 4k+3 e−v dv = 16 Γ (4k + 4). Now, the pdf
198 3 Random Variables
fγ (x)
0 x
1
1
Fig. 3.17 The pdf f γ (x) = exp −x 4 1 − γ sin x 4 u(x): when γ = β, although f (x) =
1
24
1
2 f γ (x)u(x) + f β (−x)u(−x) is not symmetric, all odd ordered moments are 0
1
f (x),
2 γ
x ≥ 0,
f (x) = (3.3.40)
1
f (−x),
2 β
x <0
for γ = β is not an even function, and the (2n + 1)-st moment of f (x) is
∞ ∞ 0
1
x 2n+1
f (x) d x = x 2n+1
f γ (x) d x + x 2n+1
f β (−x) d x
−∞ 2 0 −∞
∞ ∞
1
= x 2n+1
f γ (x) d x − x 2n+1
f β (x) d x
2 0 0
=0 (3.3.41)
because the moment of f γ (x) is equal to that of f β (x) at the same order. Specifically,
when γ = 1 and β = −1, all the odd-ordered moments are 0 and the even-ordered
moments are m 2n = 16 (8n + 3)!. The pdf (3.3.38) is shown in Fig. 3.17. ♦
We have discussed so far how we can obtain the moments based on the cdf, pdf, or
pmf. On the other hand, we can easily obtain the moments by using the Laplace or
Fourier transform as we can solve, for example, differential equations more easily
by using the Laplacetransform.
The set √12π e jωx of complex orthonormal basis functions has the property
! "ω∈R ∞ j (ω−ν)x
∞ jωx − jνx
of √12π e jωx , √12π e jνx = 2π
1
−∞ e e d x = 2π
1
−∞ e d x, i.e.,
# $
1 jωx 1 jνx
√ e ,√ e = δ(ω − ν), (3.3.42)
2π 2π
3.3 Expected Values and Moments 199
√
which can be shown easily from, for example, (1.E.11), where j = −1. The
Fourier transform H (ω) = F {h(x)} of h(x) and the inverse Fourier transform
h(x) = F−1 {H (ω)} of H (ω) can be expressed as
∞
H (ω) = h(x)e− jωx d x (3.3.43)
−∞
and
∞
1
h(x) = H (ω)e jωx dω, (3.3.44)
2π −∞
respectively, based on the set √1 e jωx of the orthonormal basis functions.
2π ω∈R
which is the expected value ϕ X (ω) = E e jω X , is called the characteristic function
(cf) of X .
Theorem 3.3.2 If the cdf’s of two random variables are the same, then their cf’s
are the same, and vice versa.
In other words, the cf is also a function with which we can characterize the
probabilistic properties of a random variable.
Example 3.3.16 For a geometric random variable with pmf p(k) = (1 − α)k α for
k ∈ {0, 1, . . .}, the cf is
200 3 Random Variables
α
ϕ(ω) = . (3.3.47)
1 − (1 − α)e jω
If the pmf is p(k) = (1 − α)k−1 α for k ∈ {1, 2, . . .} for a geometric random variable,
αe jω
then the cf is ϕ(ω) = 1−(1−α)e jω . For the NB distribution with pmf (2.5.14), the cf is
αr
ϕ(ω) = r (3.3.48)
1 − (1 − α)e jω
r
αe jω
while the cf is ϕ(ω) = 1−(1−α)e jω
if the NB distribution has pmf (2.5.17). ♦
∞ 1
Example 3.3.17 We have the cf ϕ(ω) = −∞ √2πσ exp − 2σ1 2 x 2 − 2mx + m 2
∞
+ jωx d x = exp − σ 2ω + jmω √2πσ 2 2
2 2
1
−∞ exp − 1
2σ 2 x − m + jωσ d x,
i.e.,
σ2 ω2
ϕ(ω) = exp − + jmω (3.3.49)
2
n
n
zl z k∗ ϕ (ωl − ωk ) ≥ 0, (3.3.50)
l=1 k=1
where {ωk }nk=1 are real numbers and {z k }nk=1 are complex numbers.
Proof
(2) Consider a real number μ0 such that 0 < μ0 < 2ε for a positive real number
ε. Assuming a periodic function b̄(y) with period π and b̄(y) = |y| + μ0 for
|y| ≤ π2 , we have
∞
nπ+ π2
E b̄ (ν X ) = μ0 + |ν (x − nπ)| f X (x)d x
π
n=−∞ nπ− 2
= |ν| (μ̄ − μ0 ) + μ0 (3.3.52)
1
from e jαX − e jβ X = {cos (αX ) − cos (β X )}2 + {sin (αX ) − sin (β X )}2 2
√
= 2 − 2 cos {(α − β)X }. If we let δ = ε−2μ 0
, then 0 < δ < ∞. Therefore,
μ̄
from (3.3.52) and (3.3.53), we get |ϕ(α) − ϕ(β)| = E e jαX − e jβ X ≤
jαX
E e − e jβ X ≤ |α − β| μ̄ − μ0 + 2μ0 < ε−2μ μ̄
0
(μ̄ − μ0 ) + 2μ0 , i.e.,
ε−2μ0
(μ̄ − μ0 ) + 2μ0 , if μ̄ − μ0 = 0,
|ϕ(α) − ϕ(β)| < μ̄−μ0
2μ0 , if μ̄ − μ0 = 0
≤ε (3.3.54)
when |α − β| < δ. Thus, for any ε > 0, we have |ϕ(α) − ϕ(β)| < ε if |α − β| <
δ = ε−2μ
μ̄
0
, implying that ϕ(ω) is uniformly continuous at every real number ω.
(3) For a random variable with cf ϕ(ω) and pdf f , we have
⎧ 2 ⎫
n
n ⎨ n ⎬
zl z k∗ ϕ (ωl − ωk ) = E zl e jωl X
⎩ ⎭
l=1 k=1 l=1
≥0 (3.3.55)
202 3 Random Variables
n
n
n
n ∞ ∞
because zl z k∗ ϕ (ωl − ωk ) = zl z k∗ −∞ e j(ωl −ωk )x f (x)d x = −∞
l=1 k=1 l=1 k=1
n n
zl e jωl x z k∗ e− jωk x f (x)d x. ♠
l=1 k=1
N m, σ 2 in Example 3.3.17. Based on this result and
(3.3.56),
we canobtain the
m σ2 ω2 ω
cf of Y = σ (X − m) as ϕY (ω) = exp jω − σ exp − 2 σ2 + jm σ , i.e.,
1
ω2
ϕY (ω) = exp − . (3.3.57)
2
This result implies that if X ∼ N m, σ 2 , then σ1 (X − m) ∼ N (0, 1). ♦
which is the expected value M X (t) = E et X , is called the moment generating func-
tion (mgf) of X .
implying that the cf and mgf are basically the same in the sense that, by taking
the inverse transform of the cf or mgf, we can obtain the cdf. Normally, the cf is
guaranteed its convergence whereas the convergence region of the mgf should be
considered for the inverse transform, and for some distributions the mgf does not
exist.
Based on the discussion in Sect. 3.2 and Definitions
∞ 3.3.7 and 3.3.8, the cf of
Y = g(X ) can be obtained as ϕY (ω) = E e jωY = −∞ e jω y d FY (y), i.e.,
3.3 Expected Values and Moments 203
ϕY (ω) = E e jωg(X )
∞
= e jωg(x) d FX (x). (3.3.60)
−∞
from the definition of the cf. In other words, the cf is the complex conjugate of the
Fourier transform of the pdf. Hence, we can obtain the pdf from the cf as
of the pdf f (x) and the unit step function u(x). The Fourier transform of the convo-
lution of two functions is the product of the Fourier transforms of the two functions.
Noting that the Fourier transform of the unit step function u(x) is
1
F {u(x)} = πδ(ω) + (3.3.65)
jω
ϕ(−ω)
F {F(x)} = πϕ(0)δ(ω) + . (3.3.66)
jω
204 3 Random Variables
∞ ϕ(−ω)
Inverse transforming (3.3.66), the cdf F(x) = 1
2π −∞ πϕ(0)δ(ω) + jω
exp ( jωx) dω can be expressed as (Papoulis 1962)
∞
ϕ(0) 1 ϕ(−ω)
F(x) = + exp ( jωx) dω
2 2π j −∞ ω
∞
ϕ(0) j ϕ(ω)
= + exp (− jωx) dω. (3.3.67)
2 2π −∞ ω
3.3.4.5 Cumulants
∞
Expanding the natural logarithm ψ(ω) = ln ϕ(ω) = ln 1 + ( jω)s ms!s of the cf
s=1
ϕ(ω) in the power series of jω near ω = 0, we get
∞ ( ∞ (2 ∞ (3
ms 1 1
s ms s ms
ψ(ω) = ( jω)s
− ( jω) + ( jω) + ···
s=1
s! 2 s=1 s! 3 s=1 s!
jω ( jω)2 ( jω)3
= m1 + m 2 − m 21 + m 3 − 3m 1 m 2 + 2m 31 + ···
1! 2! 3!
∞
( jω)n
= kn , (3.3.68)
n=1
n!
Example 3.3.19 The first, second, and third cumulants are the same as the mean
k1 = m 1 , the variance k2 = m 2 − m 21 = σ 2 , and the third central moment k3 = m 3 −
3m 2 m 1 + 2m 31 = μ3 , respectively. In addition, the fourth cumulant is k4 = m 4 −
2
4m 3 m 1 − 3m 22 + 12m 2 m 21 − 6m 41 = μ4 − 3 m 2 − m 21 = μ4 − 3σ 4 . ♦
Definition 3.3.10 (coefficient of variation; skewness; kurtosis) Let the mean, vari-
ance, n-th central moment, and n-th cumulant be m, σ 2 , μn , and kn , respectively.
Then, v1 = mσ , v2 = σμ33 = √k3 3 , and v3 = σμ44 = 3 + kk42 are called the coefficient of
k2 2
v2 > 0 v2 < 0
Example 3.3.21 Assume the pdf f X (x) = λ exp (−λx) u(x) of X . Then, as
we have observed in Example 3.3.9, we have E{X } = m = λ1 , μ2 = Var{X } =
∞ 3 2
σ 2 = λ12 , and μ3 = 0 x − λ1 λe−λx d x = − λ63 + λ62 x − λ1 + λ3 x − λ1
3 −λx ∞
+ x − λ1 e = λ23 . Therefore, the coefficient of variation is v1 = mσ =
1 1 −1 0
−1
λ λ
= 1 and the skewness is v2 = σμ33 = λ23 λ13 = 2. ♦
The kurtosis v3 represents how sharp the peak is when compared to the normal
distribution: when v3 = 3, the sharpness of the peak of the distribution is the same
as that of the normal distribution, when v3 < 3, the distribution is less sharp than the
normal distribution, called platykurtic or mild peak, and when v3 > 3, the distribution
is sharper than the normal distribution, called leptokurtic or sharp peak.
Example 3.3.22 When the pdf is f X (x) = λ exp (−λx) u(x) for X , σ = λ1 as
∞ 4
we have observed in Example 3.3.21. In addition, μ4 = 0 x − λ1 λe−λx d x =
4 3 2 ∞
− x − λ1 + λ4 x − λ1 + λ122 x − λ1 + λ243 x − λ1 + λ244 e−λx , i.e.,
0
206 3 Random Variables
9
μ4 = . (3.3.70)
λ4
μ4 9 1 −1
Thus, the kurtosis is v3 = σ4
= λ4 λ4
= 9. ♦
When we obtain the moments such as the mean and variance, we need to evaluate
one integral for each of the moments. While the number of integration is the same as
the number of moments that we want to obtain based on the definition of moments,
we can first obtain the cf or mgf by one integration and then obtain the moments by
differentiation if we use the moment theorem: note that differentiation is easier in
general to evaluate than integration.
Theorem 3.3.4 The k-th moment of X can be obtained as
∂k
mk = j −k
ϕ X (ω)
∂ω k
ω=0
= j −k ϕ(k)
X (0) (3.3.71)
or
∞ ∞ ∂k
jωx
Proof First, if we evaluate E X k = −∞ x k f X (x)d x = −∞ f X (x) j1k ∂ω k e
∞ ω=0
−k ∂ k −k ∂ k jωx
d x = j ∂ωk −∞ f X (x)e d x ω=0 recollecting j ∂ωk e
jωx
= x , we get
k
ω=0
∂k
E X k
= j −k
ϕ (ω) . (3.3.73)
∂ω k
X
ω=0
Similarly, by differentiating the mgf M X (t) k times, we get M X(k) (t) = E X k et X
and, subsequently, the desired result. ♠
Theorem 3.3.4 is referred to as the moment theorem.
Example 3.3.23 For X ∼ N m, σ 2 , we have ϕ X (ω) = exp − ω 2σ + jmω as
2 2
observed in Example 3.3.17. Thus, E{X } = j −1 ϕX (0) = m, E X 2 = j −2 ϕX (0) =
m 2 + σ 2 , and Var{X } = σ 2 . ♦
3.3 Expected Values and Moments 207
In evaluating the moments of discrete random variables via the moment theorem, it
is often convenient to let z = e jω and s = et when using the cf and mgf, respectively.
n
Example 3.3.25 For K ∼ b(n, p), the cf is ϕ K (ω) = n Ck p
k
(1 − p)n−k e jkω ,
k=0
i.e.,
n
ϕ K (ω) = pe jw + (1 − p) . (3.3.74)
Now, letting e jω = z and writing the cf as γ K (z) = ϕ K (ω) , we get
e jω =z
We then have
∂i ∂i
because ∂z i γ K (z) = ∂z i E z
K
= E K (K − 1) · · · (K − i + 1)z K −i . Therefore,
γ K (1) = 1, γ K (1) = E{K } = np, and γ K (1) = E K 2 − E{K } = n(n − 1) p 2 .
From these results, we have E{K } = np and Var{K } = np(1 − p). ♦
∞ k k
λ z
Example 3.3.26 Consider K ∼ P(λ). Then, γ K (z) = e−λ k!
= eλ(z−1) and
k=0
ϕ K (ω) = exp λ e jω − 1 from P(K = k) = e−λ λk! for k = 0, 1, 2, . . .. In other
k
words, γ K (1) = E{K } = λ and γ K (1) = E K 2 − E{K } = λ2 . Therefore, E{K } =
∞
s k λk! e−λ ,
k
λ and Var{K } = λ. Meanwhile, the mgf of K is G K (s) = M K (t) t =
s=e k=0
i.e.,
9Unless stated otherwise, an appropriate region of convergence is assumed when we consider the
mgf.
208 3 Random Variables
The moment theorem also implies the following: similar to how a function can
be expressed in terms of the coefficients of the Taylor series or Fourier series, the
moments or central moments are the coefficients with which the pdf can be expressed.
Specifically, if we express the mgf M X (t) of a random variable X in a series expan-
sion, we have
∞
E {X n }
M X (t) = tn. (3.3.78)
n=0
n!
n ∞
}
Now, when the coefficients E{X
n!
of two distributions are the same, i.e., when
n=0
the moments are the same, the two distributions will be the same. Based on this
observation, by comparing the first few coefficients such as the mean and second
moment, we can investigate how similar a distribution is to another.
In this section, we discuss the conditional distribution (Park et al. 2017; Sveshnikov
1968) of a random variable under conditions given in the form of an event.
P(X = x, A)
p X |A (x) = (3.4.1)
P(A)
Example 3.4.1 Let X be the face number from rolling a fair die. When we know
that the number is an odd number, the pmf of X is
1
, x = 1, 3, 5,
p X |A (x) = 3 (3.4.2)
0, otherwise
because P(A) = 1
2
for the event A = {the number is an odd number}. ♦
Definition 3.4.2 (conditional cdf; conditional pdf) When the occurrence of an event
A with P(A) > 0 is assumed, the function
3.4 Conditional Distributions 209
d
f X |A (x) = FX |A (x) (3.4.4)
dx
FX |A (x2 ) − FX |A (x1 )
P (A |x1 < X ≤ x2 ) = P(A). (3.4.5)
FX (x2 ) − FX (x1 )
f X |A (x)
P(A|X = x) = P(A), (3.4.6)
f X (x)
which is called the total probability theorem for continuous random variables. Sim-
ilarly,
210 3 Random Variables
∞
P(A) = P(A|X = x) p X (x) (3.4.9)
x=−∞
Example 3.4.2 Consider a rod with thickness 0. Cut the rod into two parts. Choose
one of the two parts at random and cut it into two. Find the probability PT 2 that the
three parts obtained in this way can make a triangle.
Solution Let the length of the rod be 1. As in Examples 2.3.6 and 2.3.7, let the
point of the first cutting be X . Then, the pdf of X is f X (v) = u(v)u(1 − v). Call the
interval [0, X ] the left piece and the interval [X, 1] the right piece on the real line.
When X = t, we get
∞
PT 2 = P ( triangle with the three pieces| X = t) f X (t) dt
−∞
1
= P ( triangle with the three pieces| X = t) dt (3.4.10)
0
based on (3.4.8). We can make a triangle when the sum of the lengths of two pieces
is larger than the length of the third piece. When 0 < t < 21 , we should choose
the right piece and the second cutting should be placed10 in 21 , t + 21 . Thus, we
have P ( triangle with
the three pieces| X = t) = P(choose
the right piece)P (the
second cutting is in 21 , t + 21 choose the right piece , i.e.,
1 length of 21 , t + 21
P ( triangle with the three pieces| X = t) =
2 length of the right piece
1 t
= . (3.4.11)
21−t
Similarly, when 21 < t < 1, we should choose the left piece and the second cutting
should be placed11 in t − 21 , 21 . Thus, P ( triangle with the three pieces| X = t) =
P(choose the left piece)P the second cutting is in t − 21 , 21 choose the left piece ,
i.e.,
11−t
P ( triangle with the three pieces| X = t) = . (3.4.12)
2 t
10 Denoting the location of the second cutting by y ∈ (t, 1), the lengths of the three pieces are t,
y − t, and 1 − y, resulting in the condition 21 < y < t + 21 of y to make a triangle.
11 Denoting the location of the second cutting by y ∈ (1, t), the lengths of the three pieces are y,
1 1
Using (3.4.11) and (3.4.12) in (3.4.10), we get PT 2 = 2 1 t
0 2 1−t dt + 1
1 1−t
2 t
dt =
2
1
1
− 1
2
(t
+ ln |1 − t|) t=0 +
2 1
2
(−t
+ ln |t|) = ln 2 − ≈ 0.1931.
t= 21
1
2
Meanwhile, considering that a triangle cannot be made if the shorter piece is
chosen among the first two pieces, assume that we choose the longer of the two
pieces and then cut it into two. Then, we have the probability
2PT 2 = 2 ln 2 − 1
≈ 0.3863 (3.4.13)
n
FX (x) = FX |Bi (x)P (Bi ) , (3.4.14)
i=1
n
f X (x) = f X |Bi (x)P (Bi ) , (3.4.15)
i=1
and
n
p X (x) = p X |Bi (x)P (Bi ) , (3.4.16)
i=1
respectively.
and
2 1 7
f X |H1 (x) = u x− u −x . (3.4.18)
3 4 4
Then, we have
1 3
f X (x) = f X |H0 (x) + f X |H1 (x)
4 1 4
, − 3
≤ x < ; , ≤ x < 43 ;
1 2 1
= 61 3 4 4 3 4 (3.4.19)
, ≤ x < 4 ; 0, otherwise
2 4
7
as the pdf of X . ♦
Theorem 3.4.3 Using (3.4.8) in (3.4.7), we get
P(A|X = x)
f X |A (x) = f X (x)
P(A)
P(A|X = x) f X (x)
= ∞ , (3.4.20)
−∞ P(A|X = x) f X (x)d x
By integrating the conditional pdf (3.4.20), the conditional cdf discussed in (3.4.3)
can be obtained as
x
FX |A (x) = f X |A (t)dt
−∞
x
P(A|X = t) f X (t)dt
= −∞
∞ . (3.4.21)
−∞ P(A|X = t) f X (t)dt
Example 3.4.4 Express the conditional cdf FX |X ≤a (x) and the conditional pdf
f X |X ≤a (x) in terms of the pdf f X and the cdf FX of a continuous random variable X .
Solution First, the conditional cdf FX |X ≤a (x) = P(X ≤ x|X ≤ a) can be written
as
P(X ≤ x, X ≤ a)
FX |X ≤a (x) = . (3.4.23)
P(X ≤ a)
3.4 Conditional Distributions 213
fX|X≤a (x)
fX (x)
0 a x
Here, we have
P(X ≤ x)
FX |X ≤a (x) =
P(X ≤ a)
FX (x)
= (3.4.24)
FX (a)
FX |X ≤a (x) = 1 (3.4.25)
when x > a because P(X ≤ x, X ≤ a) = P(X ≤ a). From (3.4.24) and (3.4.25),
we finally have
f X (x)
f X |X ≤a (x) = u(a − x), (3.4.26)
FX (a)
Example 3.4.5 Let F be the cdf of the time X of a failure for a system: i.e., F(t) =
P(X ≤ t) is the probability that the system fails before time t, and 1 − F(t) = P(X >
t) is the probability that the system does not fail before time t. We also define the
conditional rate of failure β(t) via
Letting A = {X > t} in (3.4.3) and differentiating the result, we get the conditional
F (x)
pdf f X |X >t (x) = 1−F(t) = ∞ ff(x)
(x)d x
of X . Using this result, the conditional rate of
t
failure β(t) = f X |X >t (t) can be expressed as
214 3 Random Variables
F (t)
β(t) = . (3.4.28)
1 − F(t)
x
=x 0 β(t)dt.
from − ln(1 − F(x)) Subsequently, by differentiating (3.4.29), we get
f (x) = β(x) exp − 0 β(t)dt . ♦
Example 3.4.6 When the pdf of X is f X , obtain the conditional mean of X under
the assumption A = {X ≤ a}.
Solution Using
∞ the conditional pdf (3.4.26), we can obtain the conditional mean
∞
E{X |A} = −∞ x f X |A (x)d x = −∞ x f X |X ≤a (x)d x as
a
1
E{X |A} = x f X (x)d x (3.4.31)
FX (a) −∞
∞
from (3.4.30). Here, lim E{X |A} = 1
FX (∞) −∞ x f X (x)d x. Thus, (3.4.31) is in
a→∞
agreement with that the mean E{X } can be written as E{X |Ω}, i.e.,
∞
1
E{X } = x f X (x)d x (3.4.32)
FX (∞) −∞
because FX (∞) = 1. ♦
∞Definition 3.4.3 can be extended to the conditional expected value E {g(X )|A} =
−∞ g(x)d FX |A (x) of Y = g(X ) as
3.4 Conditional Distributions 215
⎧ ∞
⎪
⎪ g(x) f X |A (x)d x, continuous random variable,
⎨ −∞
E {g(X )|A} =
∞ (3.4.33)
⎪
⎪
⎩ g(x) p X |A (x), discrete random variable
x=−∞
We have observed in Sect. 2.4 that the probability of an event can often be obtained
quite easily by first obtaining the conditional probability under an appropriate condi-
tion. We will now similarly see that obtaining the conditional expected value first will
be quite useful when we try to obtain the expected value. Evaluation of the expected
values via conditioning will be discussed again in Sect. 4.4.3.
Theorem 3.4.4 The expected value E{X } of X can be expressed as
n
E{X } = E {X |Ak } P (Ak ) , (3.4.34)
k=1
Proof We show the theorem for discrete random variables only. From (2.4.13),
∞
∞ n
n
∞
we easily get E{X } = x p X (x) = x p X |Ak (x)P ( Ak ) = x
x=−∞ x=−∞ k=1 k=1 x=−∞
n
p X |Ak (x)P ( Ak ) = E {X |Ak } P ( Ak ). ♠
k=1
Example 3.4.7 There are a red balls and b green balls in a box. Pick one ball at
random from the box: if it is red, we put it back into the box; and if it is green, we
discard it and put one red ball from another source into the box. Let X n be the number
of red balls in the box and E {X n } = Mn be the expected value of X n after repeating
this experiment n times. Show that
1
Mn = 1− Mn−1 + 1 (3.4.35)
a+b
Obtain the probability Pn that the ball picked at the n-th trial is red.
216 3 Random Variables
Solution We have
and
Mn−1
P (n-th ball is green) = 1 − . (3.4.41)
a+b
n
1 n
Letting μ = 1 − a+b 1
, we get Mn = aμn + 1−μ1−μ
= a + b − b 1 − a+b from
Mn = μMn−1 + 1 = μ(μMn−2 + 1) + 1 = μ2 Mn−2 + 1 + μ = · · · = μn M0 + 1 +
1 n−1
μ + · · · + μn−1 and M0 = a. We also have Pn = Ma+b
n−1
= 1 − a+b
b
1 − a+b for
n = 1, 2, . . .. ♦
In this section, we discuss four classes of widely-used random variables (Hahn and
Shapiro 1967; Johnson and Kotz 1970; Kassam 1988; Song et al. 2002; Thomas
1986; Zwillinger and Kokoska 1999) in detail. We start with the normal random
variables, followed by the binomial, Poisson, and exponential random variables. The
normal distributions are again discussed extensively in Chap. 5.
Definition 3.5.1 (normal random variable) A random variable with the pdf
1 (x − m)2
f (x) = √ exp − (3.5.1)
2πσ 2 2σ 2
is called the normal random variable and its distribution is denoted by N m, σ 2 .
3.5 Classes of Random Variables 217
We have already seen in (3.3.49) that the cf and mgf of N m, σ 2 are
σ2 ω2
ϕ(ω) = exp jmω − (3.5.6)
2
and
σ2 t 2
M(t) = exp mt + , (3.5.7)
2
respectively.
The tail probability of the normal distribution is used quite frequently in many
areas such as statistics, communications, and signal processing. Let us briefly discuss
an approximation of the tail probability
x of the normal distribution. First, the error
function Θ(x) = erf(x) = √2π 0 exp −t 2 dt can be expressed as
√
erf(x) = 2Φ 2x − 1 (3.5.8)
in terms of the standard normal cdf Φ(x). For the tail integral
218 3 Random Variables
∞ 2
1 t
Q(x) = √ exp − dt (3.5.9)
2π x 2
of the standard normal pdf, also called the complementary standard normal cdf, we
have
x φ(x)
φ(x) < Q(x) < (3.5.10)
1+x 2 x
sider only the case x > 0, it is known (Börjesson and Sundberg Mar. 1979) that
Q a (x) is the optimum upper bound on Q(x) when a ≈ 0.344 and b ≈ 5.334 and
optimum lower bound when a = π1 and b = 2π. In addition, when a ≈ 0.339 and
b ≈ 5.510, Q a (x) minimizes max Q a (x)−Q(x)
Q(x) .
x
F(x) = n Ck p
k n−k
q (3.5.12)
k=0
y
P(x ≤ K ≤ y) = Pn (k) (3.5.13)
k=x
3.5 Classes of Random Variables 219
of the event {x ≤ K ≤ y} for K ∼ b(n, p). The pdf of b(n, p) can be written as
n
f (x) = k n−k
n Ck p q δ(x − k).
k=0
Example 3.5.1 Let X be the number of 2’s when we roll a fair die five times. Then,
k 5 5−k
X ∼ b 5, 16 and P5 (k) = 5 Ck 16 6
for k = 0, 1, . . . , 5. ♦
Example 3.5.2 A fair die is rolled seven times. Let Y be the number of even
k 1 7−k
numbers. Then, Y ∼ b 7, 21 and P7 (k) = k7 21 2
= 128
1 7
k
for k = 0, 1,
. . . , 7. ♦
The sequence {Pn (k)}nk=0 increases until k = (n + 1) p − 1 and k = (n + 1) p
when (n + 1) p is an integer and until k = [(n + 1) p] when (n + 1) p is not an integer,
and then decreases.
Example 3.5.3 In b 3, 41 , Pn (k) is maximum at k = (n + 1) p − 1 = 0 and k = 1.
Specifically, P3 (0) = P3 (1) = 27
64
, P3 (2) = 64
9
, and P3 (3) = 64
1
. ♦
In evaluating Pn (k) = n Ck p k (1 − p)n−k , we need to calculate n Ck , which
becomes rather difficult when n is large and k is near n2 . We now discuss some
methods to alleviate this problem by considering the asymptotic approximations of
Pn (k) as n → ∞.
Definition 3.5.3 (small o) When
f (x)
lim = 0, (3.5.14)
x→∞ g(x)
the function f (x) is of lower order than g(x) for x → ∞, and is denoted by f = o(g).
Definition 3.5.3 implies that, when f = o(g) for x → ∞, f (x) increases slower
than g(x) as x → ∞.
Definition 3.5.4 (big O) Suppose that f (x) > 0 and g(x) > 0 for a sufficiently large
number x. When there exists a natural number M satisfying
f (x)
≤ M (3.5.15)
g(x)
for a sufficiently large number x, f (x) is said to be of, at most, the order of g(x) for
x → ∞, and is denoted by f = O(g).
From Definitions 3.5.3 and 3.5.4, when f (x) and g(x) are both positive for a
sufficiently large x, we have f = O(g) if f = o(g).
Example 3.5.4 We have ln x = o(x) for x → ∞ from lim lnxx = 0 for the two
x→∞
functions f (x) = ln x and g(x) = x, and x 2 = o x 3 + 1 for x → ∞ from
2
lim x 3x+1 = 0. ♦
x→∞
220 3 Random Variables
Example
3.5.8
The distribution of even numbers from 1000 rollings of a fair die
is b 1000, 21 . Thus, the Gaussian approximation of the probability that we have
−1
500 times of even numbers is P1000 (500) ≈ 2π × 1000 × 21 × 21 ≈ 0.0252.
√ −1 100
Similarly, P1000 (510) ≈ 500π e− 500 ≈ 0.0207 from (3.5.16). ♦
k2
Let us try to approximate P(k1 ≤ k ≤ k2 ) = n Ck p
k
(1 − p)n−k by making
k=k1
use of the steps in the proof of Theorem 3.5.1 shown in Appendix 3.2. First,
k2
k2
exp − (k−np)
2
when npq 1, we have n Ck p (1 − p)
k n−k
≈ √2πnpq
1
2npq
=
k=k1 k=k1
k2
− (k−np)
2
√ 1
2πnpq
exp 2npq
k + 21 − k − 21 , i.e.,
k=k1
k2
1 k2
(x − np)2
n Ck p (1 − p) ≈ √ exp −
k n−k
dx
k=k1
2πnpq k1 2npq
k2 − np k1 − np
= Φ √ −Φ √ (3.5.17)
npq npq
3.5 Classes of Random Variables 221
√
for k2 − k1 = O npq . The integral in (3.5.17) implies that the approximation
error will be small when |k1 − k2 | 1, but it could be large otherwise. To reduce
such an error, we often use the approximation
) * ) *
k2
k2 − np + 1
k1 − np − 1
n Ck p
k
(1 − p)n−k ≈ Φ √ 2
−Φ √ 2
, (3.5.18)
k=k1
npq npq
which is called the continuity correction and is considered also in (6.2.66), Example
6.2.26, and Exercise 6.10.
Example 3.5.9 In Example 3.5.8, the probability of 500, 501, or 502 times of
even numbers is P1000 (500) + P1000 (501)
+ P1000 (502)
≈ 0.0754. With the two
approximations above, we have Φ √2502
− Φ √250
0
≈ 0.0503 from (3.5.17) and
Φ √2.5
250
− Φ √−0.5
250
≈ 0.0754 from (3.5.18). ♦
λk −λ
n Ck p
k
(1 − p)n−k → e (3.5.19)
k!
for k = O(np).
Theorem 3.5.2 is called the Poisson limit theorem or the Poisson approximation
of binomial distribution.
Example 3.5.10 For b 1000, 10−3 , we have P1000 (0) = 0.9991000 ≈ 0.3677, for
which the Poisson approximation provides exp(−np) = exp(−1) ≈ 0.3679. ♦
the Poisson approximation for P(K = 0) ≈ 0.3677 is 0.3679. With the Gaussian
1
approximation, we would get √1.998π exp − 1.998
1
≈ 0.2420 from the normal distri-
bution N (1, 0.999). ♦
As we can see from Examples 3.5.10 and 3.5.11, the Poisson approximation is
more accurate than the Gaussian approximation when p is close to 0, and vice versa.
222 3 Random Variables
Definition 3.5.5 (Poisson random variable) A random variable with the pmf
λk
pk = e−λ (3.5.20)
k!
implying that the two events {ka points in Da } and {kb points in Db } are independent
of each other. The set of infinitely many points described above is called the random
Poisson points or Poisson points as defined below.
Definition 3.5.6 (random Poisson points) A collection of points satisfying the two
properties below is called random Poisson points, or simply Poisson points, with
parameter λ.
(1) P (k points in an inteval of length t) = e−λt (λt)
k
k!
.
(2) If two intervals Da and Db are non-overlapping, then the events {ka points in
Da } and {kb points in Db } are independent of each other.
The parameter λ in Definition 3.5.6 represents the average number of points in a
unit interval.
Example 3.5.12 Assume a set of Poisson points with parameter λ. Find the prob-
ability P(A|C) of A = {ka points in Da = (t1 , t2 )} when there are kc points in
Dc = (t1 , t3 ), where t1 ≤ t2 ≤ t3 .
Solution Let B = {kb points in Db } and C = {kc points in Dc }, where Db =
(t2 , t3 ) and kb = kc − ka . Then, because AC = {ka points in Da , kc points in Dc }
= {ka points in Da , kb points in Db } = AB, and Da and Db are non-overlapping,
we get P(A|C) = P(AC) P(C)
= P(AB)
P(C)
, i.e.,
P(A)P(B)
P(A|C) = . (3.5.25)
P(C)
−1
We thus finally get P(A|C) = e−λta (λtkaa)! e−λtb (λtkbb)! e−λtc (λtkcc)!
ka kb kc
, i.e.,
ka kb
kc ! ta tb
P(A|C) = , (3.5.26)
ka !kb ! tc tc
where ta = t2 − t1 , tb = t3 − t2 , and tc = t3 − t1 . ♦
Example 3.5.13 Assume a set of Poisson points. Let X be the distance from a fixed
point t0 to the nearest point to the right-hand direction. Then, we have the cdf FX (x) =
P(X ≤ x) = P (at least one point exists in (t0 , t0 + x)) can be obtained as
for x ≥ 0. Thus, we have the pdf f X (x) = λe−λx u(x) and X is an exponential random
variable. ♦
Example 3.5.14 Consider a constant α and a set of Poisson points with parameter
λ. Then, for the number N of Poisson points in the interval (0, α), we have P(N =
k) = e−λα (λα)
k
k!
. In other words, N ∼ P(λα). ♦
224 3 Random Variables
Definition 3.5.7 (exponential random variable) A random variable with the pdf
is called an exponential random variable, where the parameter λ > 0 is called the
exponential rate or rate.
∞When X is an exponential random variable with rate λ > 0, the mgf M(t) =
−λx
0 e tx
λe d x is
λ
M(t) = , t < λ, (3.5.29)
λ−t
of X , we get
>s+t)
for s, t ≥ 0 because P(X > s + t | X > t) = P(X P(X >t)
= 1−F(s+t)
1−F(t)
= e−λs . The
property expressed by (3.5.31) is called the memoryless property of the exponential
distribution.
Example 3.5.15 Assume that the lifetime of an electric bulb follows an exponential
distribution. The result (3.5.31) implies that, if the electric bulb is on at some moment,
the distribution of the remaining lifetime of the bulb is the same as that of the original
lifetime: the remaining lifetime of the bulb at any instant follows the same distribution
as a new bulb. In other words, for a bulb that is on at time t, the probability that the
bulb will be on at t + s is simply the probability that the bulb will be on for s time
units. This can be exemplified as follows. When a person finds a bulb is lit in a
place and the person does not know from when the bulb has been lit, how long does
the person expect the bulb will be on? Surprisingly, if the lifetime of the bulb is
an exponential random variable, the remaining lifetime is the same as a new bulb.
In a slightly different way, this can be described as ‘the past does not influence the
future’, which is called the Markov property. ♦
3.5 Classes of Random Variables 225
which is satisfied by only exponential distribution (Komjath and Totik 2006) among
continuous12 distributions. This can be proved as follows: Assume a function g(·)
satisfies13
Solution Let X be the waiting time for a customer at the bank to finish the transaction,
and denote by F the cdf of X . Then, the waiting time is the same as the time required
to finish the transaction, and X is an exponential random variable with rate λ = 10 1
.
Thus, we get P1 = P(X > 15) = 1 − P(X ≤ 15) = 1 − F(15) = e−15λ = e− 2 ≈
3
0.2231. Next, because an exponential random variable is not influenced by the past,
the probability P2 is the same as the probability that a customer will wait more than
5 minutes: in other words, P2 = P(X > 5) = e−5λ = e− 2 ≈ 0.6065.
1
♦
1 − F(t + 5)
P5 = , (3.5.34)
1 − F(t)
where F is the cdf of the lifetime of the bulb. Thus, the probability will be available
only if we know how long the bulb has been lit on at t. ♦
for a random variable with pdf f and cdf F. The function β(t) is the conditional rate
of failure for an object to become inoperable after being operated for t time units.
Let us now discuss the failure rate function β(t) for an exponential random vari-
able. The rate of failure of an object that has operated for t time units is the same
as that of a new object because an exponential random variable is not influenced by
−λt
the past. In other words, as we can observe from β(t) = λe e−λt
= λ, the failure rate
function for an exponential random variable is the rate of the exponential random
variable, a constant independent of t. The rate is the inverse of the mean and repre-
sents how many events occurs on the average over a unit time interval: for example,
when the time interval between occurrences of an event is an exponential random
variable with mean 10 1
, the rate λ = 10 tells us that the event occurs ten times on the
average in a unit time.
Example 3.5.18 (Yates and Goodman 1999) Consider an exponential random vari-
able X with parameter λ. Show that K = X is a geometric random variable with
parameter p = 1 − e−λ .
Appendices
We now discuss the cdf in the context of function theory (Gelbaum and Olmsted
1964).
Definition 3.A.1 (cdf) A real function F(x) possessing all of the three following
properties is a cdf:
(1) The function F(x) is non-decreasing: F(x + h) ≥ F(x) for h > 0.
(2) The function F(x) is continuous from the right-hand side: F x + = F(x).
(3) The function F(x) has the limits lim F(x) = 0 and lim F(x) = 1.
x→−∞ x→∞
A cdf is a finite and monotonic function. A point x such that F(x + ε) − F(x −
ε) > 0 for every positive number ε, F(x) = F x − , and F x + = F(x) = F x −
is called an increasing point, a continuous point, and a discontinuity, +respectively,
− of
F(x). Here, as
we have already seen in Definition 1.3.3, p x = F x − F x =
F(x) − F x − is the jump of F(x) at x. A cdf may have only type 1 discontinuity,
i.e., jump discontinuity, with every jump between 0 and 1.
Example 3.A.1 Consider the function g shown in Fig. 3.20. Here, y1 is the local
minimum of y = g(x), and x1 and x11 are the two solutions to y1 = g(x). In addition,
y2 is the local maximum of y = g(x), and x2 and x22 are the solutions to y2 = g(x).
Let x3 < x4 < x5 be the X coordinates of the crossing points of the straight line
Y = y and the function Y = g(X ) for y1 < y < y2 . Then, x11 < x3 < x2 < x4 <
x1 < x5 < x22 and y = g (x3 ) = g (x4 ) = g (x5 ) for y1 < y < y2 . Obtain the cdf FY
of Y = g(X ) in terms of the cdf FX of X , and discuss if the cdf FY is a continuous
function.
Y
g(X)
y2
Y =y
y1
0 x11 x3 x2 x4 x1 x5 x22 X
−
lim FY (y) = FX x11 . (3.A.6)
y↑y1
When y ↓ y1 , we have lim FY (y) = lim FX (x3 ) + FX (x5 ) − FX (x4 ) + P X
+ y↓y1 y↓y1
= x4 = FX x11 + FX x1+ − FX x1− + P X = x1− , i.e.,
+ − +
+
− x3 → x11 , x4 → x1 , x5 → x14
because 1 , FX x11 = FX (x11 ), FX x1+ −
FX x1 = P (X = x1 ), and, for any type of random variable X , P X = x1−
14Note that P X = k − = 0 even for a discrete random variable X because the value p X (k) =
P(X = k) is 0 when k is not an integer for a pmf p X (k).
Appendices 229
= 0. Thus, from the second line of (3.A.5), (3.A.6), and (3.A.7), the continuity of the
cdf FY (y) at y = y1 can be summarized as follows: The cdf FY (y) is (A) continuous
from the right-hand side at y = y1 and (B) continuous
− from the left-hand side at
y = y1 only if FX (x11 ) + P (X = x1 ) − FX x11 = P (X = x1 ) + P (X = x11 ) is
0 or, equivalently, only if P (X = x1 ) = P (X = x11 ) = 0.
− + −
Next,
+ when x3 →
−y ↑ y2 , recollecting that +
x2 , x4 → x2 , x5 → x22 ,
FX x2 − FX x2 = P (X = x2 ), and P X = x2 = 0, we get lim FY (y) =
−y↑y
2
lim {FX (x3 ) + FX (x5 ) − FX (x4 ) + P (X = x4 )} = FX x2− + FX x22 −
y↑y2
+ +
FX x2 + P X = x2 , i.e.,
−
lim FY (y) = FX x22 − P (X = x2 ) . (3.A.8)
y↑y2
In addition,
lim FY (y) = lim FX g −1 (y)
y↓y2 y↓y2
= FX (x22 ) (3.A.9)
+ +
because FX x22 = FX (x22 ) and g −1 (y) → x22 for y ↓ y2 . Thus, from the fourth
case of (3.A.5), (3.A.8), and (3.A.9), the continuity of the cdf FY (y) at y = y2
can be summarized as follows: The cdf FY (y) is (A) continuous from the right-
hand side at y = y2 and (B) continuous from the left-hand side at y = y2 only
−
if FX (x22 ) − FX x22 + P (X = x2 ) = P (X = x2 ) + P (X = x22 ) is 0 or, equiv-
alently, only if P (X = x2 ) = P (X = x22 ) = 0. Exercises 3.56 and 3.57 also deal
with the continuity of the cdf. ♦
+ −
Let {xν } be the set of discontinuities of the cdf F(x), and pxν = F xν − F xν
be the jump of F(x) at x = xν . Denote by Ψ (x) the sum of jumps of F(x) at
discontinuities not larger than x, i.e.,
Ψ (x) = pxν . (3.A.10)
xν ≤x
The function Ψ (x) is increasing only at {xν } and is constant in a closed interval not
containing xν : thus, it is a step-like function described following Definition 3.1.7.
If we now let
ψ(x) = F(x) − Ψ (x), (3.A.11)
then ψ(x) is a continuous function while Ψ (x) is continuous only from the right-
hand side. In addition, Ψ (x) and ψ(x) are both non-decreasing and satisfy Ψ (−∞)
= ψ(−∞) = 0, Ψ (+∞) = a1 ≤ 1, and ψ(+∞) = b ≤ 1. Here, the functions
Fd (x) = a11 Ψ (x) and Fc (x) = b1 ψ(x) are both cdf, where Fd (x) is a step-like func-
tion and Fc (x) is a continuous function. Rewriting (3.A.11), we get
230 3 Random Variables
Fig. 3.21 The decomposition of cdf F(x) = Ψ (x) + ψ(x) with a step-like function Ψ (x) and a
continuous function ψ(x)
i.e.,
Figure 3.21 shows an example of the cdf F(x) = Ψ (x) + ψ(x) with a step-like
function Ψ (x) and a continuous function ψ(x).
The decomposition (3.A.13) is unique because the decomposition shown in
(3.A.12) is unique. Let us prove this result. Assume that two decompositions
of F(x) = Ψ (x) + ψ(x) = Ψ1 (x) + ψ1 (x) are possible. Rewriting this equation,
we get Ψ (x) − Ψ1 (x) = ψ1 (x) − ψ(x). The left-hand side is a step-like function
because it is the difference of two step-like functions and, similarly, the right-hand
side is a continuous function. Therefore, both sides should be 0. In other words,
Ψ1 (x) = Ψ (x) and ψ1 (x) = ψ(x). By the discussion so far, we have shown the fol-
lowing theorem:
Theorem 3.A.1 Any cdf F(x) can be decomposed as
In Theorem 3.A.1, Fd (x) and Fc (x) are called the discontinuous or discrete part
and continuous part, respectively, of F(x). Now, from Theorem 3.1.3, we see that
there exists a countable set D such that D d Fd (x) = 1, where the integral is the
Lebesgue-Stieltjes integral. The function Fc (x) is continuous but not differentiable
at all points. Yet, any cdf is differentiable at almost every point and thus, based on
the Lebesgue decomposition theorem, the continuous part Fc (x) in (3.A.14) can be
decomposed into two continuous functions Fac (x) and Fs (x) as
Theorem 3.A.2 Any cdf F(x) can be decomposed into a step-like function Fd (x),
an absolutely continuous function Fac (x), and a singular function Fs (x) as
where a1 ≥ 0, a2 ≥ 0, a3 ≥ 0, and a1 + a2 + a3 = 1.
Example 3.A.2 Let us consider an example of a singular cdf. Assume the closed
interval [0, 1] and the ternary expression
∞
ai (x)
x=
i=1
3i
= 0.a1 (x)a2 (x)a3 (x) · · · (3.A.17)
of a point x in the Cantor set C discussed in Example 1.1.46, where ai (x) ∈ {0, 1}.
Now, let n(x) be the location of the first 1 in the ternary expression (3.A.17) of x with
n(x) = ∞ if no 1 appears eventually. Define a cdf as (Romano and Siegel 1986)
⎧
⎪ 0, x < 0,
⎪
⎪ g(x),
⎨ x ∈ ([0, 1] − C) ,
F(x) = c jj , x=
2c j
∈ C, (3.A.18)
⎪
⎪ j 2 3j
⎪
⎩ j
1, x ≥ 1,
232 3 Random Variables
i.e., as
⎧
⎨ 0, x < 0,
F(x) = φC (x), 0 ≤ x ≤ 1,
⎩
1, x ≥1
⎧
⎪
⎪ 0, x ≤ 0,
⎨
n(x)
ai (x)
= 21+n(x)
1
+ , 0 ≤ x ≤ 1, (3.A.19)
⎪
⎪
2i+1
⎩ i=1
1, x ≥1
with the open interval A2c1 ,2c2 ,...,2ck−1 defined by (1.1.41) and (1.1.42). Then, it is easy
to see that F(x) is a continuous cdf. In addition, as we have observed in Example
1.3.11, the derivative of F(x) is 0 at almost every point in ([0, 1] − C) and thus is
not a pdf. In other words, F(x) is not an absolutely continuous cdf but is a singular
cdf.
Some specific values of the cdf (3.A.19) are as follows: We have F 19 =
2
ai ( 19 )
1
21+2
+ = 1 + 1 = 1 from 1 = 0.01 and n 1 = 2, we have F 2 =
2i+1 8 8 4 9 3 9 9
i=1
∞
ai ( 29 ) 2 1
1
2∞
+ 2i+1
= 1
4
from 2
9
= 0.023 and n 9
= ∞, and we have F 3
= 1
21+1
+
i=1
1
ai ( 13 )
2i+1
=from 13 = 0.13 and n 13 = 1. Similarly, from 23 = 0.23 and n 23 =
1
2
i=1
∞, we have F 23 = 0 + 24 = 21 ; from 0 = 0.03 and n(0) = ∞, we have F(0) = 0;
∞
and from 1 = 0.22 · · ·3 and n(1) = ∞, we have F(1) = 0 + 2
2i+1
= 1. ♦
i=1
The inverse cdf F −1 , the inverse function of the cdf F, can be defined specifically
as
1 3
1
2
2
1 1
3
0 1 2 3 x 0 1 1 1 u
3 2
for y1 < y2 . In other words, like a cdf, an inverse cdf is a non-decreasing function.
For example, we have F1−1 13 = inf x : F1 (x) ≥ 13 = inf {x : x ≥ 1} = 1 and
F1−1 21 = inf x : F1 (x) ≥ 21 = inf {x : x ≥ 1} = 1. Note that, unlike a cdf, an
inverse cdf is continuous from the left-hand side. Figure 3.22 shows the cdf F1 and
the inverse cdf F1−1 . ♦
Theorem 3.A.3 (Hajek et al. 1999) Let F and F −1 be a cdf and its inverse, respec-
tively. Then,
F F −1 (u) ≥ u (3.A.25)
if F is continuous.
234 3 Random Variables
Proof Let Su be the set15 of all x such that F(x) ≥ u, and x L be the smallest number
in Su . Then,
F −1 (u) = x L (3.A.27)
and
F (x L ) ≥ u (3.A.28)
because F(x) ≥ u for any point x ∈ Su . In addition, because Su is the set of all x
such that F(x) ≥ u, we have F(x) < u for any point x ∈ Suc . Now, x L is the smallest
number in Su and thus x L − ∈ Suc because x L − < x L when > 0. Consequently,
we have
F (x L − ) < u (3.A.29)
when > 0. Using (3.A.27) in (3.A.28), we get F F −1 (u) ≥ u. Next, recollect-
−1
ing (3.A.27),
and combining (3.A.28) and (3.A.29), we
get F F (u) − < u ≤
F F −1 (u) . In other words,
u is a number
between F F −1(u) − and F F −1 (u) .
Now, u = F F −1 (u) because lim F F −1 (u) − = F F −1 (u) if F is a contin-
→0
uous function. ♠
Example 3.A.4 For the cdf
⎧
⎨ 0, x < 0; x
4
, 0 ≤ x < 1;
F2 (x) = 1
, 1 ≤ x < 2; 1
(x + 1), 2 ≤ x < 3; (3.A.30)
⎩2 4
1, x ≥ 3
we have F2 F2−1 41 = F2 (1) = 21 ≥ 41 , F2 F2−1 21 = F2 (1) = 21 ≥ 21 , F2−1
(F2 (1)) = F2−1 21 = 1 ≤ 1, and F2−1 F2 23 = F2−1 21 = 1 ≤ 23 . Figure 3.23
shows the cdf F2 and the inverse cdf F2−1 . ♦
Note that even if F is a continuous function, we have not F −1 (F(x)) = x but
F −1 (F(x)) ≤ x. (3.A.32)
15 Here, because a cdf is continuous from the right-hand side, Su is either in the form Su = [x L , ∞)
or in the form Su = {x L , x L + 1, . . .}.
Appendices 235
1 3
3
4
2
1
2
1 1
4
0 1 2 3 x 0 1 1 3 1 u
4 2 4
P F −1 (F(X )) = X = 0 (3.A.33)
when X ∼ F.
Example 3.A.5 Consider the cdf F1 (x) and the inverse cdf F1−1 (u) discussed in
Example 3.A.3. Then, we have
⎧
⎪
⎪ 0, F1−1 (u) ≤ 0,
⎪ 1 −1
⎪
−1 ⎨ 31 F1 (u), 0 ≤ F1−1 (u) < 1,
−1
F1 F1 (u) = 2 , 1 ≤ F1−1 (u) ≤ 2,
⎪
⎪ −1
⎪
⎪
1
F (u) − 1 , 2 ≤ F1 (u) ≤ 3,
⎩2 1
1, F1−1 (u) ≥ 3
u, 0 < u < 13 ; 21 , 13 ≤ u ≤ 21 ;
= (3.A.34)
u, 21 < u < 1
and
3F1 (x), 0 < F1 (x) ≤ 13 ; 1, 1
≤ F1 (x) ≤ 21 ;
F1−1 (F1 (x)) = 3
2F1 (x) + 1, 21 < F1 (x) < 1
x, 0 < x < 1; 1, 1 ≤ x ≤ 2;
= (3.A.35)
x, 2 < x < 3,
which
are shown
in Fig. 3.24. The results (3.A.34) and (3.A.35) clearly confirm
F1 F1−1 (u) ≥ u shown in (3.A.26) and F1−1 (F1 (x)) ≤ x shown in (3.A.32). ♦
The results of Example 3.A.5 and Exercises 3.79 and 3.80 imply the following:
In general, even when a cdf is a continuous function, if it is constant
over an inter-
val, its inverse is discontinuous. Specifically, if F(a) = F b− = α for a < b or,
equivalently, if F(x) = α over a ≤ x < b, the inverse cdf F −1 (u) is discontinuous
at u = α and F −1 (α) = a. On the other hand, even if a cdf is not a continuous func-
tion, its inverse is a continuous function when the cdf is not constant over an interval.
Figure 3.25 shows an example of a discontinuous cdf with a continuous inverse.
236 3 Random Variables
1 1 3
1 1
2
2 2
1 1 1
3 3
0 1 2 3 x 0 1 1 1 u 0 1 2 3 x
3 2
Fig. 3.24 The results F1 F1−1 (u) and F1−1 (F1 (x)) from the cdf F1
F (x) F −1 (u)
1 2
2
3
1
1
3
0 1 2 x 0 1 2 1 u
3 3
Theorem 3.A.4 (Hajek et al. 1999) If a cdf F is continuous, its inverse F −1 is strictly
increasing.
−1
Proof When F is continuous, −1assume ) = F −1 (y2 ) for y1 < y2 . Then, from
F (y1−1
(3.A.26), we have y1 = F F (y1 ) = F F (y2 ) = y2 , which is a contradiction
to y1 < y2 . In other words,
when y1 < y2 . From (3.A.22) and (3.A.36), we have F −1 (y1 ) < F −1 (y2 ) for y1 <
y2 when the cdf F is continuous. ♠
Theorem 3.A.5 (Hajek et al. 1999) When the pdf f is continuous, we have
d −1 1
F (u) = (3.A.37)
du −1
f F (u)
if f F −1 (u) = 0, where F is the cdf.
Appendices 237
d −1 1
F (u) = −1 (3.A.38)
du f F (u)
because F(v) = F F −1 (u) = u and du = f (v)dv from (3.A.26). ♠
n!
n Ck p
k
(1 − p)n−k = p np+l q nq−l
(np + l)!(nq − l)!
n! p np q nq (np)!(nq)! pl
= × . (3.A.39)
(np)!(nq)! (np + l)!(nq − l)! q l
√ n
First, using the Stirling approximation n! ≈ 2πn ne , which can be obtained
√ n n √
n n
from 2πn e < n! < 2πn 1 + 4n 1
e
, the first part of the right-hand side
of (3.A.39) becomes
√
n! p np q nq 2πnn n e−n
≈ √ √ p np q nq
(np)!(nq)! 2πnp(np)np e−np 2πnq(nq)nq e−nq
√
2πn n n p np q nq e−n
= √ √ × × −np −nq
2πnp 2πnq (np) (nq)
np nq e e
1
= √ . (3.A.40)
2πnpq
Letting t j = j
npq
and using e x ≈ 1 + x for x ≈ 0, the first part of the right-hand
,
(
+
l −1
(np)!(npq)l (np)l
side of (3.A.41) becomes (np+l)!q l = (np+l)(np+l−1)···(np+1) = 1 + npj
=
j=1
238 3 Random Variables
(−1 −1
l
+ +
l
1 + qt j ≈ e qt j
and the second part of the right-hand side
j=1 j=1
+
l
(nq+1− j)
+
l
(nq)! pl
of (3.A.41) can be rewritten as (nq−l)!(npq)l
= j=1
(nq)l
= nq+1
nq
− j
nq
≈
j=1
+
l l
+ +
l
1− j
nq
= 1 − pt j ≈ e− pt j . Employing these two results in (3.A.41),
j=1 j=1 j=1
+
l +
l
(np)!(nq)! pl e− pt j
we get (np+l)!(nq−l)! q l
≈ eqt j
= e−t j = exp − l(l+1)
2npq
, i.e.,
j=1 j=1
(np)!(nq)! pl l2
≈ exp − . (3.A.42)
(np + l)!(nq − l)! q l 2npq
Now, recollecting l = k − np, we get the desired result (3.5.16) from (3.A.39),
(3.A.40), and (3.A.42).
For k = 0, we have
n
because lim (1 − p)n = lim 1 − λn = e−λ when np → λ. Next, for k =
n→∞ n→∞
1, 2, . . . , n, we have
n(n − 1) · · · (n − k + 1) k
Pn (k) = p (1 − p)n−k
k!
(np)k 1 − n1 1 − n2 · · · 1 − k−1
= (1 − p) n n
. (3.A.44)
k! (1 − p)k
p(k) = −r Ck α (α
r
− 1)k , k = 0, 1, . . . (3.A.54)
r
α
M(t) = (3.A.55)
1 − (1 − α)et
p(k) = k−1 Cr −1 α (1
r
− α)k−r , k = r, r + 1, . . . (3.A.56)
r
αet
M(t) = (3.A.57)
1 − (1 − α)et
240 3 Random Variables
λk −λ
p(k) = e , k = 0, 1, . . . (3.A.58)
k!
M(t) = exp −λ(1 − et ) (3.A.59)
Uniform distribution
1
p(k) = , k = 0, 1, . . . , n − 1 (3.A.60)
n
1 − ent
M(t) = (3.A.61)
n(1 − et )
α 1
f (x) = (3.A.62)
π (x − β)2 + α2
ϕ(ω) = exp ( jβω − α|ω|) (3.A.63)
x 2 −1 x
n
λ −λ|x|
f (x) = e (3.A.70)
2
λ2
M(t) = 2 (3.A.71)
λ − t2
Normal distribution
1 (x − m)2
f (x) = √ exp − (3.A.72)
2πσ 2 2σ 2
2 2
σ t
M(t) = exp mt + (3.A.73)
2
16 In (3.A.75), the function Φ(·) is the standard normal cdf defined in (3.5.3).
242 3 Random Variables
Exercises
Exercise 3.5 Obtain the pmf of X in Example 3.1.17 assuming that each ball taken
is replaced into the box before the following trial.
Exercise 3.6 Obtain the pdf and cdf of Y = X 2 + 1 when X ∼ U [−1, 2).
Exercise 3.7 Obtain the cdf of Y = X 3 − 3X when the pmf of X is p X (k) = 17 for
k ∈ {0, ±1, ±2, ±3}.
Exercise 3.8 Obtain the expected value E X −1 when the pdf of X is f X (r ) =
r 2 −1 exp − r2 u(r ).
1 n
n
2 2 Γ ( n2 )
Exercise 3.9 Express the pdf of the output Y = X u(X ) of a half-wave rectifier in
terms of the pdf f X of X .
Exercises 243
2
Exercise 3.10 Obtain the pdf of Y = X − 1θ when the pdf of X is f X (x) =
θe−θx u(x).
Exercise 3.11 Let the pdf and cdf of a continuous random variable X be f X and
FX , respectively. Obtain the conditional cdf FX |b<X ≤a (x) and the conditional pdf
f X |b<X ≤a (x) in terms of f X and FX , where a > b.
Exercise 3.12 For X ∼ U [0, 1), obtain the conditional mean E{X |X > a} and
2
conditional variance Var{X |X > a} = E X − E{X |X > a} X > a when 0 <
a < 1. Obtain the limits of the conditional mean and conditional variance when
a → 1.
Exercise 3.13 Obtain the probability P(950 ≤ R < 1050) when the resistance R of
a resistor has the uniform distribution U (900, 1100).
Exercise 3.14 The cost of being early and late by s minutes for an appointment is
cs and ks, respectively. Denoting by f X the pdf of the time X taken to arrive at the
location of the appointment, find the time of departure for the minimum cost.
Exercise 3.15 Let ω be the outcome of a random experiment of taking one ball from
a box containing one each of red, green, and blue balls. Obtain P(X ≤ α), P(X ≤ 0),
and P(2 ≤ X < 4), where
π, ω = green ball or blue ball,
X (ω) = (3.E.5)
0, ω = red ball
Exercise 3.17 In successive tosses of a fair coin, let the number of tosses until the
first head be K . For the two events A = {K > 5} and B = {K > 10}, obtain the
probabilities of A, B, B c , A ∩ B, and A ∪ B.
Exercise 3.18 Data is transmitted via a sequence of N bits through two independent
channels C A and C B . Due to channel noise during the transmission, w A and w B bits
are in error among the sequences A and B of N bits received through channels C A
and C B , respectively. Assume that the noise on a bit does not influence that on others.
(1) Obtain the probability P(D = d) that the number D of error bits common to A
and B is d.
(2) Assume the sequence of N bits is reconstructed by selecting each bit from A
with probability p or from B with probability 1 − p. Obtain the probability
P(K = k) that the number K of error bits is k in the reconstructed sequence of
N bits.
(3) When N = 3 and w A = w B = 1, obtain P(D = d) and P(K = k).
244 3 Random Variables
of X .
(1) Obtain P(0 < X < 1) and P(1 ≤ X < 1.5).
(2) Obtain the mean μ = E{X } and variance σ 2 = Var{X } of X .
Exercise 3.30 Find a function g for Y = g(X ) with which we can obtain the expo-
nential random variable Y ∼ f Y (y) = λe−λy u(y) from the uniform random variable
X ∼ U [0, 1).
Exercise 3.32 Assume p X (k) = 17 for k ∈ {1, 2, 3, 5, 15, 25, 50}. Show that the
− c|} is 5. Compare this value with the value of
value of c that minimizes E{|X
b that minimizes E (X − b)2 .
Exercise 3.33 Obtain the mean and variance of a random variable X with pdf f (x) =
λ −λ|x|
2
e , where λ > 0.
r α−1 (1−r )β−1
Exercise 3.34 For a random variable X with pdf f (r ) = B̃(α,β)
u(r )u(1 − r ),
show that the k-th moment is
Γ (α + k)Γ (α + β)
E Xk = (3.E.8)
Γ (α + β + k)Γ (α)
Exercise 3.35 Show that E{X } = R (0) and Var(X ) = R (0), where R(t) =
ln M(t) and M(t) is the mgf of X .
Exercise 3.38 For a random variable X such that P(0 ≤ X ≤ a) = 1, show the
following:
(1) E X 2 ≤ aE{X }. Specify when the equality holds true.
(2) Var{X } ≤ E{X }(a − E{X }).
2
(3) Var{X } ≤ a4 . Specify when the equality holds true.
(1) When X can assume values in {0, 1, 2, . . .}, show that the expected value E{X } =
∞
nP(X = n) is
n=0
∞
E{X } = P(X > n)
n=0
∞
= P(X ≥ n). (3.E.9)
n=1
(2) Express E X c− in terms of FX (x) = P(X ≤ x), where X c− = min(X, c) for a
constant c.
(3) Express E X c+ in terms of FX (x), where X c+ = max(X, c) for a constant c.
Exercise 3.40 Assume the cf ϕ X (ω) = exp μ exp λ e jω − 1 − 1 of X ,
where λ > 0 and μ > 0.
(1) Obtain E{X } and Var{X }.
(2) Show that P(X = 0) = exp −μ 1 − e−λ .
is the pdf of X . Sketch the cdf and pdf, and obtain P(X ≤ 6).
Exercise
α ∞median α of a1continuous random variable X can be defined via
3.43 The
−∞ X f (x)d x = α f X (x)d x = 2 . Show that the value of b minimizing E{|X − b|}
is α.
Exercise 3.44 When F is the cdf of continuous random variable X , obtain the
expected value E{F(X )} of F(X ).
Exercise 3.45 Obtain the mgf and cf of a negative exponential random variable X
with pdf f X (x) = e x u(−x). Using the mgf, obtain the first four moments. Compare
the results with those obtained directly.
Exercise 3.47 When f (x) = coshαn (βx) is a pdf, determine the value of α in terms of
β. Note that this pdf is the same as the logistic pdf (2.5.30) when n = 2 and β = 2k .
Exercise 3.48 Show that the mgf is as shown in (3.A.75) for the Rayleigh random
x2
variable with pdf f (x) = αx2 exp − 2α 2 u(x).
1
P(X = an even number) = 1 + (q − p)n (3.E.11)
2
for k = 0, 1, . . . , n − 1.
1
P(X = an even number) = 1 + e−2λ (3.E.13)
2
for k = 0, 1, 2, . . ..
Exercise 3.52 Find a cdf that has infinitely many jumps in the finite interval (a, b).
is the cdf of X .
248 3 Random Variables
(1) Obtain the condition that the two constants a and b should satisfy and sketch the
region of the condition on a plane with the a-b coordinates.
(2) When a = 18 and b = 58 , obtain the cdf of Y = X 2 , P(Y = 1), and P(Y = 4).
of X .
Exercise 3.56 Discuss the continuity of the cdf FY (y) shown in (3.E.18) when X is
a continuous random variable.
of X .
2π 2
Exercise 3.59 Assume the cf ϕ(ω) = 1
2π 0 exp − ω2 α(θ) dθ of a random vari-
able.
(1) Obtain the cf’s by completing the integral when α(θ) = 1
2
and when α(θ) =
cos2 θ.
Exercises 249
for p = 0, 1, . . . when
Y ∼ N m, σ 2 . The result (3.E.24)
implies that we have
E Y 0 = 1, E Y 1 = m, E Y 2 = m 2 + σ 2 , E Y 3 = m 3 + 3mσ 2 , E Y 4 =
m 4 + 6m 2 σ 2 + 3σ 4 , E Y 5 = m 5 + 10m 3 σ 2 + 15mσ 4 , . . . when Y ∼ N m, σ 2 .
(Hint. Note that Y = σ X + m.)
and obtain themean and variance for a Rayleigh random variable X with pdf f X (x) =
x2
x
α2
exp − 2α 2 u(x).
Exercise 3.63 For a random variable with pdf f X (x) = 21 u(x)u(π − x) sin x, obtain
the mean and second moment.
Exercise 3.65 Consider the absolute value Y = |X | for a continuous random vari-
able X with pdf f X . If we consider the half mean17
∞
m ±X = x f X (x)u(±x)d x, (3.E.28)
−∞
E{|X |} = m +X − m −X , (3.E.29)
and obtain the variance of Y = |X | in terms of the variance and half means of X .
Exercise 3.66 For a continuous random variable X with cdf FX , show that
and
Exercise 3.67 For a random variable X with pmf p(k) = (1 − α)k α for k ∈
{0, 1, . . .}, obtain the mean and variance. For a random variable X with pmf
p(k) = (1 − α)k−1 α for k ∈ {1, 2, . . .}, obtain the mean and variance.
∞
More generally, 0 x m f X (x)d x for m = 1, 2, . . . are called the half moments, incomplete
17
where α, β, and γ are natural numbers such that α + β ≥ γ. Note that min(β, γ) ≥
max(0, γ − α) and that the pmf (3.E.32) is zero when x ∈ / {max(0, γ − α), max(0,
γ − α) + 1, . . . , min(β, γ)}. Obtain the mean and variance of X . Show that the ν-th
moment of X is
ν
α+β−k
ν γ−k
ν
E {X } = [β]k α+β . (3.E.33)
k=0
k γ
Here,
1
k
ν i k
= (−1) (k − i)ν (3.E.34)
k k! i=0 i
is the Stirling number of the second kind, and α + β, β, and γ represent the size of
the group, number of ‘successes’, and number of trials, respectively.
−kx
Exercise 3.69 For a random variable X with pdf f (x) = ke −kx 2 , show that
(1+e )
E X = 3k 2 , E X = 15k 4 , and m L = k = −m L , where the half means m +
2 π2 4 7π 4 + ln 2 −
L
and m −L are defined in (3.E.28).
Exercise 3.70 Obtain the mgf, expected value, and variance of a random variable Y
λn
with pdf f Y (x) = (n−1)! x n−1 e−λx u(x).
Exercise 3.71 A coin with probability p of head is tossed twice in one trial. Define
X n as
⎧
⎨ 1, if the outcome is head and then tail,
X n = −1, if the outcome is tail and then head, (3.E.35)
⎩
0, if the two outcomes are the same
based on the two outcomes from the n-th trial. Obtain the cdf and mean of Y =
min{n : n ≥ 1, X n = 1 or − 1}.
Exercise 3.72 Assume a cdf F such that F(x) = 0 for x < 0, F(x) < 1 for 0 ≤
x < ∞, and
1 − F(x + y)
= 1 − F(x) (3.E.36)
1 − F(y)
252 3 Random Variables
for 0 ≤ x < ∞ and 0 ≤ y < ∞. Show that there exists a positive number β satisfying
1 − F(x) = exp − β u(x).
x
Exercise 3.74 In the distribution b 10, 13 , at which value of k is P10 (k) the largest?
At which value of k is P11 (k) the largest in the distribution b 11, 21 ?
Exercise 3.75 The probability of a side effect from a flu shot is 0.005. When 1000
people get the flu shot, obtain the following probabilities and their approximate
values:
(1) The probability P01 that at most one person experiences the side effect.
(2) The probability P456 that four, five, or six persons experience the side effect.
Exercise 3.76 Show that the skewness and kurtosis18 of b(n, p) are √ 1−2 p and
np(1− p)
3(n−2)
n
+ 1
np(1− p)
, respectively, based on Definition 3.3.10.
Confirm that F3 F3−1 (u) = u, F3−1 (F3 (x)) ≤ x, and P F3−1 (F3 (X )) = X = 0
when X ∼ F3 . Sketch F3 (x), F3−1 (u) and F3−1 (F3 (x)).
Confirm that F4 F4−1 (u) = u, F4−1 (F4 (x)) ≤ x, and P F4−1 (F4 (X )) = X = 0
when X ∼ F4 . Sketch F4 (x), F4−1 (u), and F4−1 (F4 (x)).
1 1
f X (x) = u(x)u(1 − x) + u(x − 1)u(3 − x)
2
⎧1 4
⎨ 2 , 0 ≤ x < 1,
⎪
= 1
, 1 ≤ x < 3, (3.E.40)
⎪
⎩
4
0, otherwise,
√
obtain and sketch the pdf of Y = X.
References
N. Balakrishnan, Handbook of the Logistic Distribution (Marcel Dekker, New York, 1992)
E.F. Beckenbach, R. Bellam, Inequalities (Springer, Berlin, 1965)
P.J. Bickel, K.A. Doksum, Mathematical Statistics (Holden-Day, San Francisco, 1977)
P.O. Börjesson, C.-E.W. Sundberg, Simple approximations of the error function Q(x) for commu-
nications applications. IEEE Trans. Commun. 27(3), 639–643 (Mar. 1979)
W. Feller, An Introduction to Probability Theory and Its Applications, 3rd edn., revised printing
(Wiley, New York, 1970)
W.A. Gardner, Introduction to Random Processes with Applications to Signals and Systems, 2nd
edn. (McGraw-Hill, New York, 1990)
B.R. Gelbaum, J.M.H. Olmsted, Counterexamples in Analysis (Holden-Day, San Francisco, 1964)
G.J. Hahn, S.S. Shapiro, Statistical Models in Engineering (Wiley, New York, 1967)
J. Hajek, Nonparametric Statistics (Holden-Day, San Francisco, 1969)
J. Hajek, Z. Sidak, P.K. Sen, Theory of Rank Tests, 2nd edn. (Academic, New York, 1999)
N.L. Johnson, S. Kotz, Distributions in Statistics: Continuous Univariate Distributions, vol. I, II
(Wiley, New York, 1970)
S.A. Kassam, Signal Detection in Non-Gaussian Noise (Springer, New York, 1988)
A.I. Khuri, Advanced Calculus with Applications in Statistics (Wiley, New York, 2003)
P. Komjath, V. Totik, Problems and Theorems in Classical Set Theory (Springer, New York, 2006)
A. Leon-Garcia, Probability, Statistics, and Random Processes for Electrical Engineering, 3rd edn.
(Prentice Hall, New York, 2008)
M. Loeve, Probability Theory, 4th edn. (Springer, New York, 1977)
E. Lukacs, Characteristic Functions, 2nd edn. (Griffin, London, 1970)
R.N. McDonough, A.D. Whalen, Detection of Signals in Noise, 2nd edn. (Academic, New York,
1995)
D. Middleton, An Introduction to Statistical Communication Theory (McGraw-Hill, New York,
1960)
A. Papoulis, The Fourier Integral and Its Applications (McGraw-Hill, New York, 1962)
254 3 Random Variables
A. Papoulis, S.U. Pillai, Probability, Random Variables, and Stochastic Processes, 4th edn.
(McGraw-Hill, New York, 2002)
S.R. Park, Y.H. Kim, S.C. Kim, I. Song, Fundamentals of Random Variables and Statistics (in
Korean) (Freedom Academy, Paju, 2017)
V.K. Rohatgi, A.KMd.E. Saleh, An Introduction to Probability and Statistics, 2nd edn. (Wiley, New
York, 2001)
J.P. Romano, A.F. Siegel, Counterexamples in Probability and Statistics (Chapman and Hall, New
York, 1986)
I. Song, J. Bae, S.Y. Kim, Advanced Theory of Signal Detection (Springer, Berlin, 2002)
J.M. Stoyanov, Counterexamples in Probability, 3rd edn. (Dover, New York, 2013)
A.A. Sveshnikov (ed.), Problems in Probability Theory, Mathematical Statistics and Theory of
Random Functions (Dover, New York, 1968)
J.B. Thomas, Introduction to Probability (Springer, New York, 1986)
R.D. Yates, D.J. Goodman, Probability and Stochastic Processes (Wiley, New York, 1999)
D. Zwillinger, S. Kokoska, CRC Standard Probability and Statistics Tables and Formulae (CRC,
New York, 1999)
Chapter 4
Random Vectors
Definition 4.1.1 (random vector; continuous random vector; discrete random vec-
tor) A vector consisting of a number of random variables is called a random vector,
multi-dimensional random vector, multi-variate random variables, or joint random
variables. If the components of a random vector are all continuous random vari-
ables or all discrete random variables, then the random vector is called a continuous
random vector or a discrete random vector, respectively.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 255
I. Song et al., Probability and Random Variables: Theory and Applications,
https://doi.org/10.1007/978-3-030-97679-8_4
256 4 Random Vectors
Often, the terms random variable, random vector, and random process are used inter-
changeably.
When the size of a random vector is n, it is called an n-dimensional, n-variate,
or n-variable random vector. When some of the components of a random vector are
discrete random variables and some are continuous random variables, the random
vector is called a mixed-type or hybrid random vector. In this book, we mostly
consider only continuous and discrete random vectors.
FX (x) = P (X 1 ≤ x1 , X 2 ≤ x2 , . . . , X n ≤ xn ) , (4.1.1)
describing the probabilistic characteristics of X via the probability of the joint event
n
∩ {X i ≤ xi }, is called the joint cdf of X, where x = (x1 , x2 , . . . , xn ).
i=1
Example 4.1.1 Letting X 1 and X 2 be the first and second numbers on the face of a
fair die from two rollings, X = (X 1 , X 2 ) is a bi-variate discrete random vector. ♦
Example 4.1.2 (Thomas 1986) A fair coin is tossed three times. Let X =
(X 1 , X 2 , X 3 ), where X i denotes the outcome from the i-th toss with 1 and 0 repre-
senting the head and tail, respectively. Then, the value of the discrete random vector
X is one of (0, 0, 0), (0, 0, 1), (0, 1, 0), (1, 0, 0),(1, 1, 0), (1, 0, 1), (0, 1, 1), and
3
(1, 1, 1). The joint cdf FX (x) = P ∩ {X i ≤ xi } of X is
i=1
⎧
⎪
⎪ 0, {x1 < 0} , {x2 < 0} , or {x3 < 0} ;
⎪
⎪
⎪
⎪
1
, {0 ≤ x1 < 1, 0 ≤ x2 < 1, 0 ≤ x3 < 1} ;
⎪
⎪
8
⎪
⎪
1
, {0 ≤ x1 < 1, 0 ≤ x2 < 1, x3 ≥ 1} ,
⎪
⎪
4
⎨ {0 ≤ x1 < 1, x2 ≥ 1, 0 ≤ x3 < 1} , or
FX (x) = {x1 ≥ 1, 0 ≤ x2 < 1, 0 ≤ x3 < 1} ; (4.1.2)
⎪
⎪
⎪
⎪
1
, {0 ≤ x1 < 1, x2 ≥ 1, x3 ≥ 1} ,
⎪
⎪
2
⎪
⎪ {x1 ≥ 1, 0 ≤ x2 < 1, x3 ≥ 1} , or
⎪
⎪
⎪
⎪ {x1 ≥ 1, x2 ≥ 1, 0 ≤ x3 < 1} ;
⎩
1, {x1 ≥ 1, x2 ≥ 1, x3 ≥ 1} ;
where x = (x1 , x2 , x3 ). ♦
Proof We prove the theorem in the bi-variate case because the proofs in other cases
are similar. Assume a sequence {yn }∞ n=1 increasing to infinity. Then, the events
{X 1 ≤ x, X 2 ≤ yn }∞
n=1 are increasing sets, and thus lim {X 1 ≤ x, X 2 ≤ yn } =
n→∞
∞
∪ {X 1 ≤ x, X 2 ≤ yn } = {X 1 ≤ x, X 2 ≤ ∞} = {X 1 ≤ x}. This result implies
n=1
lim P (X 1 ≤ x, X 2 ≤ yn ) = P (X 1 ≤ x) from the continuity of probability. In other
n→∞
words, we have
where Is = {s1 , s2 , . . . , sm } and x s = xs1 , xs2 , . . . , xsm .
Definition 4.1.3 (joint pdf) For a measurable space (Ω, F) = Rk , B Rk , a real
function f is called a (k-dimensional) joint pdf if it satisfies
f (x) ≥ 0, x ∈ Rk (4.1.6)
and
f (x)d x = 1, (4.1.7)
Rk
where d x = d x1 d x2 · · · d xk .
Often, a joint pdf is called simply a pdf if it does not incur any confusion. Consider
the set function
∂n
f X (x) = FX (x), (4.1.9)
∂x
where ∂ x = ∂ x1 ∂ x2 · · · ∂ xn .
Conversely, based on Definition 4.1.2 and Theorem 4.1.2, the joint cdf can be
obtained as
xn xn−1 x1
FX (x) = ··· f X (t) d t (4.1.10)
−∞ −∞ −∞
from the joint pdf, where t = (t1 , t2 , . . . , tn ) and d t = dt1 dt2 · · · dtn . The joint cdf
FX and joint pdf f X characterize the probabilistic properties of X.
The marginal pdf f X i (xi ) = ddxi FX i (xi ) of X i can be obtained as
∞ ∞ ∞ xi ∞ ∞ ∞
d
f X i (xi ) = ··· ··· f X (t) d t
d xi−∞ −∞ −∞ −∞ −∞ −∞ −∞
∞ ∞ ∞
= ··· f X (x)d x a d x b (4.1.11)
−∞ −∞ −∞
p(x) ≥ 0 (4.1.13)
4.1 Distributions of Random Vectors 259
and
p(x) = 1 (4.1.14)
x∈Ω k
for all x = (x1 , x2 , . . . , xk ) ∈ Ω k on a measurable space Ω k , F k , where Ω and F
are a discrete sample space and the corresponding event space, respectively.
Example 4.1.4 Consider the sample space Ω k = Jkn+1 = {0, 1, . . . , n}k and k num-
k
k
bers α j ∈ (0, 1) j=1 such that α j = 1. Then,
j=1
⎧
⎨
k
n
α1x1 α2x2 · · · αkxk , xi = n, xi ∈ Jn+1 ,
p(x) = x1 ,x2 ,...,xk (4.1.16)
⎩ i=1
0, otherwise
Example 4.1.5 Consider the face number from a rolling of a fair die, and let
A = {1, 2, 3}, B = {4, 5}, and C = {6} as in Example 1.4.24. The probability
of the occurrence of four times of A, five times of B, and one time of C is
10 1 4 1 5 1 1
4,5,1 2 3 6
= 648
35
≈ 0.054 from ten rollings. ♦
p X (x) = P (X = x) . (4.1.18)
We now consider two-dimensional random vectors in detail because they are used
more frequently than, and provide insights on, higher dimensional random vectors.
Let FX,Y (x, y) = P(X ≤ x, Y ≤ y) and f X,Y be the joint cdf and pdf, respectively,
of a two-dimensional random vector (X, Y ). Then, the joint pdf can be written as
∂2
f X,Y (x, y) = FX,Y (x, y) (4.1.20)
∂ x∂ y
in terms of the joint cdf FX,Y , and the joint cdf can be expressed as
y x
FX,Y (x, y) = f X,Y (u, v)dudv (4.1.21)
−∞ −∞
in terms of the joint pdf f X,Y . In addition, we can obtain the (marginal) cdf FX (x) =
P(X ≤ x) as
∂
from the joint cdf FX,Y , and the (marginal) pdf f X (x) = ∂x
FX (x) as
∞
f X (x) = f X,Y (x, y)dy (4.1.23)
−∞
from the joint pdf f X,Y . In (4.1.22) and (4.1.23), interchanging the two random
variables X and Y , we have the cdf FY (y) = P(Y ≤ y) of Y as
∞ ∂
and the pdf f Y (y) = −∞ f X,Y (x, y)d x of Y from f Y (y) = ∂y
FY (y).
4.1 Distributions of Random Vectors 261
The function F(x, y) is continuous from the right and non-decreasing. The val-
ues of F(x, y) for x, y → ±∞ satisfy the property of a cdf. However, F(x, y)
is not a cdf because it does not satisfy (4.1.25) or, equivalently, (4.1.31):
for
example, we have F(b, d) − F(a, d) − F(b, c) + F(a, c) = F(1, 1) − F 13 , 1 −
F 1, 13 + F 13 , 13 = 1 − 1 − 1 + 0 = −1 when a = c = 13 and b = d = 1. ♦
Because a joint cdf F(x, y) is a non-decreasing function, its derivative, the joint
pdf f (x, y) satisfies f (x, y) ≥ 0 as described also in (4.1.6). In addition, from
(4.1.21) ∞ F(∞, ∞) = 1 observed in (4.1.30), or as mentioned in (4.1.7), we
∞ and
have −∞ −∞ f (x, y)d xd y = 1.
For a discrete random vector (X, Y ), similar results can be obtained. First, the
joint pmf p X,Y (x, y) = P(X = x, Y = y), satisfying
∞
∞
p X,Y (x, y) = 1, (4.1.33)
x=−∞ y=−∞
262 4 Random Vectors
in terms of the joint pmf and, conversely, the joint pmf p X,Y can be expressed as
in terms of the joint cdf FX,Y when the support of p X,Y (x, y) is a subset of the integer
lattice points {(x, y) : x, y ∈ J} in the two-dimensional space. In addition, from the
joint pmf p X,Y , the pmf p X (x) = P(X = x) of X can be obtained as
∞
p X (x) = p X,Y (x, y) (4.1.36)
y=−∞
∞
and the pmf of Y as pY (y) = P(Y = y) = p X,Y (x, y).
x=−∞
using (4.1.36). ♦
The probability that the random vector (X, Y ) will have a value in a region D is
obtained as
⎧
⎪
⎪ f X,Y (x, y)d xd y, continuous random vector,
⎪
⎨ D
d FX,Y (x, y) = (4.1.39)
⎪
⎪
D ⎪
⎩ p X,Y (x, y), discrete random vector.
(x,y)∈D
4.1 Distributions of Random Vectors 263
r + dr
y + dy r
y θ + dθ
θ
x x + dx
Fig. 4.1 The differential areas dx dy and dr r dθ in the perpendicular and polar coordinate systems,
respectively
Example4.1.8 For a random vector (X, Y) with the joint pdf f X,Y (x, y) =
x 2 +y 2
1
2πσ 2
exp − 2σ 2 , obtain the probability P X 2 + Y 2 ≤ a 2 that (X, Y ) will be
on or inside the circle of radius a and center at the origin.
a 2π 2
We then have P X 2 + Y 2 ≤ a2 = 1
2πσ 2 0 0 exp − 2σr 2 r dθ dr = − exp
2 a
− 2σr 2 , i.e.,
r =0
a2
P X +Y ≤a2 2 2
= 1 − exp − 2 (4.1.41)
2σ
by changing the perpendicular coordinate system (x, y) into the polar coordinate
system (r, θ ) as shown in Fig. 4.1. ♦
Next, let U = |face number of the first die − face number of the second die| and
V = (face number of the first die + face number of the second die) from a rolling
of a pair of dice. Then, we have
1
P (U = 0, V = i) = , i = 2, 4, . . . , 12, (4.1.43)
36
264 4 Random Vectors
2
P (U = 1, V = i) = , i = 3, 5, . . . , 11, (4.1.44)
36
2
P (U = 2, V = i) = , i = 4, 6, 8, 10, (4.1.45)
36
2
P (U = 3, V = i) = , i = 5, 7, 9, (4.1.46)
36
2
P (U = 4, V = i) = , i = 6, 8, (4.1.47)
36
2
P (U = 5, V = 7) = (4.1.48)
36
as the point mass of (U, V ). ♦
Example 4.1.10 The probability of a two dimensional hybrid random vector of
which one element is a discrete random variable and the other is a continuous ran-
dom variable is called a line mass. For example, when X is the face number from
a rolling of a die and Y is a real number chosen randomly in the interval (0, 1),
P (X = x, y1 ≤ Y ≤ y2 ) is a line mass. ♦
Example 4.1.11 Assume that we repeatedly roll a fair die until the number of even-
numbered outcomes is 10. Let N and X i denote the numbers of rolls and outcome i,
respectively, when the rolling ends. Obtain the pmf of X 2 and the pmf of X 1 .
∞
p2 (k) = P (2 at the j-th rolling; among the remaining j − 1 rollings,
j=10
k − 1 times of 2, 10 − k times of 4 or 6, and j − 10 times of 1, 3, or 5)
+P (4 or 6 at the j-th rolling; among the remaining j − 1 rollings,
k times of 2, 9 − k times of 4 or 6, and j − 10 times of 1, 3, or 5)]
∞
k−1 10−k j−10
1 ( j − 1)! 1 1 1
=
j=10
6 (k − 1)!(10 − k)!( j − 10)! 6 3 2
k 9−k j−10
1 ( j − 1)! 1 1 1
+
3 k!(9 − k)!( j − 10)! 6 3 2
1 1 1 1
= +
(k − 1)!(10 − k)! k!(9 − k)! 6k 310−k
∞
( j − 1)! 1
× (4.1.49)
j=10
( j − 10)! 2 j−10
2Here, the events ‘2 occurs k − 1 times’ for k = 0 and ‘4 or 6 occurs 9 − k times’ for k = 10 are
both empty events and thus have probability 0, which can be confirmed from (−1)! → ±∞.
4.1 Distributions of Random Vectors 265
∞
( j−1)! 1
∞
(q+9)! 9!
for k = 0, 1, . . . , 10 because ( j−10)! 2 j−10
= q!9! 2q
= 210 9! from
j=10 q=0
∞
r +x−1 Cx (1 − α)x = α −r shown in (2.5.16). Thus, X 2 ∼ b 10, 13 .
x=0
(Method 2) Among the 10 times of the occurrences of even numbers, 2 can
occur 0, 1, . . ., 10 times and the probability of 2 among {2, 4, 6} is 13 . Thus, the
probability that 2 will occur k times is
k 10−k
1 2
p2 (k) = 10 Ck (4.1.51)
3 3
for k = 0, 1, . . . , 10.
∞
(2) Similarly, recollecting r +x−1 Cx (1 − α)x = α −r shown in (2.5.16), we can
x=0
∞
obtain the pmf p1 (k) = P (X 1 = k) = P (X 1 = k, N = j) of X 1 as
j=10+k
∞
p1 (k) = P(an even number at the j-th rolling; among the remaining
j=10+k
j − 1 rollings, k times of 1, 9 times of even numbers, and
j − k − 10 times of 3 or 5)
∞ k 9 j−k−10
1 ( j − 1)! 1 1 1
= (4.1.52)
j=10+k
2 ( j − k − 10)!k!9! 6 2 3
∞
( j−1)! 1 k 1 9 1 j−k−10
∞
(i+k+9)!
1 k 1 10 1 i
from 1
2 ( j−k−10)!k!9! 6 2 3
= i!k!9! 6 2 3
=
j=10+k i=0
∞
(i+k+9)!
1 i (k+9)! 1 1 2 −k−10 (k+9)! 1 1 1 3
= (k+9)!
k 10 k 10 k 10
i!(k+9)! 3 k!9! 6 2
= 3 k!9! 6 2 k!9! 4 4
.
i=0
Thus, X 1 ∼ NB 10, 43 with the pmf (2.5.14).
♦
266 4 Random Vectors
Definition 4.1.5 (independent random vector) When the joint probability function
of a random vector is the product of the marginal probability functions of the element
random variables, the random vector is called an independent random vector.
In other words, if the joint cdf FX , joint pdf f X , or joint pmf p X of a random
vector X = (X 1 , X 2 , . . . , X n ) can be expressed as
n
FX (x) = FX i (xi ) , (4.1.54)
i=1
n
f X (x) = f X i (xi ) , (continuous random vector), (4.1.55)
i=1
or
n
p X (x) = p X i (xi ) , (discrete random vector), (4.1.56)
i=1
respectively, for every real vector x, the random vector X is an independent random
vector. The random variables of an independent random vector are all independent of
each other (Burdick 1992; Davenport 1970; Dawid 1979; Geisser and Mantel 1962;
Gray and Davisson 2010; Papoulis and Pillai 2002; Wang 1979).
Example 4.1.12 Assume the joint pmf
1
, for (u, v) = (1, 1), (2, 4), (3, 9),
p X,Y (u, v) = 3 (4.1.57)
0, otherwise
of (X, Y )
discussed in Example 4.1.7. Then, using (4.1.36), we can obtain the pmf
pY (w) = p X,Y (u, w) = p X,Y (1, w) + p X,Y (2, w) + p X,Y (3, w) of Y as
u
p X,Y (1, 1), w = 1; p X,Y (2, 4), w = 4;
pY (w) =
p X,Y (3, 9), w = 9; 0, otherwise
1
, w = 1, 4, or 9,
= 3 (4.1.58)
0, otherwise.
It is clear that p X,Y (u, w) = p X (u) pY (w) from (4.1.38) and (4.1.58). Thus, (X, Y )
is not an independent random vector. ♦
Example 4.1.13 A needle of length 2a is tossed at random on a plane ruled with
infinitely many parallel lines of an infinite length, where the distance between adja-
4.1 Distributions of Random Vectors 267
2a
Θ
2a
θT
cent lines is 2b as shown in Figs. 4.2 and 4.3. Assuming that the thickness of the
needle is negligible, find the probability PB that the needle touches one or more of
the parallel lines when a ≤ b.
Solution Let us denote by X the distance from the center of the needle to the near-
est parallel line and by Θ the smaller angle that the needle makes with the lines.
Then, X ∼ U [0, b), Θ ∼ U [0, π2 ), and X and Θ are independent. Thus, the joint
pdf f X,Θ (x, θ ) = f X (x) f Θ (θ ) of (X, Θ) can be expressed as
2 π
f X,Θ (x, θ ) = , 0 ≤ x < b, 0 ≤ θ < . (4.1.59)
πb 2
Now, when a ≤ b, recollecting that {(x, θ ) : x ≤ a sin θ } = (x, θ ) : 0 ≤ θ < π2 ,
π2
0 ≤ x ≤ a sin θ }, we get3 PB = P((X, Θ) : X < a sin Θ) = πb 2
0 a sin θ dθ as
2a
PB = . (4.1.60)
πb
The probability (4.1.60) is proportional to the length of the needle and inversely
proportional to the interval of the parallel lines, which is also appealing intuitively.
When a → 0 or b → ∞, we have PB → 0. ♦
3 {(x, θ) : x < a sin θ} = (x, θ) : 0 ≤ x ≤ a, sin−1 ax ≤ θ < π2 , the result
Considering
a π a
(4.1.60) can be obtained also as PB = π2b 0 sin2 −1 x dθd x = π2b 0 π2 − sin−1 ax d x =
a −1 x π2
a
π
2
b − π b 0 sin a d x = b − π b 0 t cos tdt = b − π b (t sin t + cos t) t=0 = π b .
a 2 a 2a a 2a 2a
268 4 Random Vectors
3π 6 + π − 3 3 ≈
2
0.8372
0 1 2 a
b
π −1 b sin−1 b
2a sin a
− sin−1 ax d x = 1 − πb 0 t cos tdt = 1 − πb
2a
(t sin t + cos t)t=0 a = 1 −
2 √
sin−1 ab + πb − 2 aπb−b with the interval of integration (x, θ ) : 0 ≤ x ≤
2 2a 2 2
π
b, sin−1 ax ≤ θ < π2 . ♦
It is easy to see that PB → π2 when a → b from (4.1.60) and (4.1.61) and that
PB → 1 when b is finite and a → ∞ from (4.1.61). Figure 4.4 shows the probability
PB as a function of ab .
n
FX (x) = FX (xi ) , (4.1.62)
i=1
the pdf
n
f X (x) = f X (xi ) (4.1.63)
i=1
4.1 Distributions of Random Vectors 269
n
p X (x) = p X (xi ) (4.1.64)
i=1
Definition 4.1.7 (two random vectors independent of each other) Consider two ran-
dom vectors X = (X 1 , X 2 , . . . ,X n ) and Y = (Y1 , Y2 , . . . , Ym). When the joint cdf
FX d ,Y d of two subvectors X d = X t1 , X t2 , . . . , X tk and Y d = Ys1 , Ys2 , . . . , Ysl sat-
isfies
FX d ,Y d x d , yd = FX d x d FY d yd (4.1.65)
when X and Y are discrete random vectors, for all subvectors X d of X and Y d of Y .
We can similarly define the independence among several random vectors by gen-
eralizing Definition 4.1.7. Note that the independence between X and Y has nothing
to do with if X or Y is an independent random vector. In other words, even when X
and Y are independent of each other, X or Y may or may not be independent random
vectors, and even when X and Y are both independent random vectors, X and Y may
or may not be independent of each other.
Proof For convenience, we show the result for the simplest case of m = n =
1. Let A y1 and B y2 be the inverse images of {Y1 ≤ y1 } for Y1 = g (X 1 ) and
{Y2 ≤ y2 } for Y2 = h (X 2 ), respectively.
we have the joint cdf FY1 ,Y2 (y1 , y2 ) =
Then,
P X 1 ∈ A y1 , X 2 ∈ B y2 =P X 1 ∈ A y1 P X 2 ∈ B y2 = FY1 (y1 ) FY2 (y2 ) because
P (Y1 ≤ y1 , Y2 ≤ y2 ) = P X 1 ∈ A y1 , X 2 ∈ B y2 . ♠
Example 4.1.16 (Stoyanov 2013) We can easily show that, if g(X ) and h(Y ) are
independent of each other and g and h are both one-to-one correspondences, then
X and Y are also independent of each other. On the other hand, the converse of
Theorem 4.1.3 does not hold true in general. In other words, when g or h is a
continuous function but is not a one-to-one correspondence, X and Y may not be
independent of each other even when g(X ) and h(Y ) are independent of each other.
Let us consider one such example of discrete random variables.
Assume the joint pmf
3 5 3
p X,Y (−1, −1) = , p X,Y (0, −1) = , p X,Y (1, −1) = ,
32 32 32
5 8 3
p X,Y (−1, 0) = , p X,Y (0, 0) = , p X,Y (1, 0) = , (4.1.68)
32 32 32
1 3 1
p X,Y (−1, 1) = , p X,Y (0, 1) = , p X,Y (1, 1) =
32 32 32
of (X, Y ). Then, p X (−1) = p X,Y (−1, −1) + p X,Y (−1, 0) + p X,Y (−1, 1) =
9
32
, and similarly p X (0) = 21 and p X (1) = 32 7
. In addition, pY (−1) =
p X,Y (−1, −1) + p X,Y (0, −1) + p X,Y (1, −1) = 32 , pY (0) = 21 , and pY (1) = 32
11 5
.
Thus, p X,Y (−1, −1) = 32 = 322 = p X (−1) pY (−1) and X and Y are not indepen-
3 99
In this section, we will focus on the distributions of functions (Horn and Johnson
1985; Leon-Garcia 2008) of random vectors in terms of the cdf, pdf, and pmf.
4.2 Distributions of Functions of Random Vectors 271
f Y ( y) |d y| = f X (x) |d x| . (4.2.1)
In other words, the probability f X (x) |d x| that X is in |d x| will be the same as the
probability f Y ( y) |d y| that Y is in |d y|, a type of conservation laws.
Theorem 4.2.1
∂ Denote the Jacobian4 of g(x) = (g1 (x), g2 (x), . . . , gn (x)) by
J (g(x)) = ∂ x g(x), i.e.,
∂g ∂g
1 1 · · · ∂g1
∂ x1 ∂ x2 ∂ xn
∂g2 ∂g2 ∂g
∂ x1 ∂ x2 · · · ∂ xn2
J (g(x)) = . . . . . (4.2.2)
.. .. . . ..
∂gn ∂gn
∂ x1 ∂ x2
· · · ∂∂gxnn
Q Q
j j j
where x j ( y) j=1
= x1 ( y), x2 ( y), . . . , xn ( y) are the solutions to the
j=1
simultaneous equations g1 (x) = y1 , g2 (x) = y2 , . . . , gn (x) = yn , i.e., {xi }i=1
n
4 The Jacobian is also referred to as the transformation Jacobian or Jacobian determinant. In addition,
∂g
1 ∂g2 · · · ∂gn
∂ x1 ∂ x1 ∂ x1
∂g1 ∂g2 · · · ∂gn
∂ x2 ∂ x2 ∂ x2
from the property of determinant, we also have J (g(x)) = . .. . . .. .
. . . .
∂g ∂g.
1 2
· · · ∂gn
∂ xn ∂ xn ∂ xn
272 4 Random Vectors
x2 y2
1
1
VY 2 3
−1 0 y1
VX
−1
0 1 x1
Fig. 4.5 Supports VX and VY of the joint pdf’s f X (x1 , x2 ) and f Y (y1 , y2 ), respectively, when the
transformation is (Y1 , Y2 ) = (3X 1 − X 2 , −X 1 + X 2 )
1
3
y2
−1 1 2 3 y1 −1 1
Fig. 4.6 The pdf’s f Y1 (y1 ) of Y1 = 3X 1 − X 2 and f Y2 (y2 ) of Y2 = −X 1 + X 2 when the joint pdf
of X = (X 1 , X 2 ) is f X (x1 , x2 ) = u (x1 ) u (1 − x1 ) u (x2 ) u (1 − x2 )
⎧ 2−y
⎪
⎨ 2 −3y2 dy1 = 1 + y2 , −1 < y2 ≤ 0,
1 2
−3y2 +2
f Y2 (y2 ) = 1 −y dy1 = 1 − y2 , 0 < y2 ≤ 1, (4.2.8)
⎪
⎩2 2
0, otherwise
of Y2 by integrating the joint pdf f Y (y1 , y2 ). Figure 4.6 shows the pdf’s f Y1 (y1 ) and
f Y2 (y2 ). ♦
When m < n, the joint pdf of Y can be obtained from Theorem 4.2.1 by employing
auxiliary variables: details will be discussed in Sect. 4.2.2. It is not possible to obtain
the joint pdf of g(X) for m > n except in very special cases as discussed in Sect.
4.5. When J (g(x)) = 0 and m = n, we cannot use Theorem 4.2.1 in obtaining the
joint pdf of g(X) because the denominator in (4.2.3) is 0: in Sect. 4.5, we briefly
discuss with some examples on how we can deal with such cases.
274 4 Random Vectors
Note that we have J g −1 ( y) = J (g(x)) 1
. The Jacobian J (g(x)) of g(x) is written
−1
∂y
also as J ( y) or ∂ x , and the Jacobian J g ( y) is expressed also as J (x) or ∂∂ xy .
if det A = | A| = 0. ♦
Example 4.2.3 When X ∼ G (α1 , β) and Y ∼ G (α2 , β) are independent of each
other, show that Z = X + Y and S = X +Y
X
are independent of each other, and obtain
the pdf of S.
Solution Expressing X and Y as X = Z S and Y = Z − Z S in terms of Z and S,
−1 ∂ −1 s z
we have the Jacobian J g (z, s) = ∂(z,s) g (z, s) = = −z of the
1 − s −z
−1
inverse transformation (X, Y ) = g (Z , S) = (Z S, Z − Z S). Thus, the joint pdf
f Z ,S (z, s) = |z| f X,Y (x, y)x=zs, y=z−zs of (Z , S) is
|z|(zs)α1 −1 (z − zs)α2 −1 zs z − zs
f Z ,S (z, s) = exp − exp −
β α1 +α2 Γ (α1 ) Γ (α2 ) β β
× u(zs)u(z − zs)
z α1 +α2 −1 z
= exp − u(z)
β α1 +α2 Γ (α1 + α2 ) β
Γ (α1 + α2 ) α1 −1
× s (1 − s)α2 −1 u(1 − s)u(s) . (4.2.11)
Γ (α1 ) Γ (α2 )
Γ (α1 +α2 ) α1 −1
Γ (α1 )Γ (α2 )
s (1 − s)α2 −1 u(1 − s)u(s). Note that S = X
X +Y
∼ B (α1 , α2 ). ♦
4.2 Distributions of Functions of Random Vectors 275
Example 4.2.4 Obtain the pdf of Y = X 1 + X 2 for X = (X 1 , X 2 ) with the joint pdf
fX.
from (4.2.13). In other words, the pdf of the sum of two random variables independent
of each other is the convolution
f X +Y = f X ∗ f Y (4.2.15)
Example 4.2.5 (Thomas 1986) Obtain the pdf of X 1 , the pdf of X 2 , and the pdf
of Y1 = X 1 + X 2 for a random vector (X 1 , X 2 ) with the joint pdf f X 1 ,X 2 (x1 , x2 ) =
24x1 (1 − x2 ) u (x1 ) u (x2 − x1 ) u (1 − x2 ).
276 4 Random Vectors
x2 y2
fY1 (y1 )
1 1 1
B
A
y1
x1 y1
1 1 2 0 1 2
Fig. 4.7 The region A = {(x1 , x2 ) : u (x1 ) u (x2 − x1 ) u (1 − x2 ) > 0} = (x1 , x2 ) : 0 ≤ x1 ≤
x2 , 0 ≤ x2 ≤ 1 and the corresponding region B = (y1 , y2 ) : u (y1 − y2 ) u (2y2 − y1 )
u (1 − y2 ) > 0 = (y1 , y2 ) : y2 ≤ y1 ≤ 2y2 , 0 ≤ y2 ≤ 1 of the transformation (y1 , y2 ) =
(x1 + x2 , x2 ). The pdf f Y1 (y1 ) of Y1 = X 1 + X 2 when the joint pdf of X 1 and X 2 is f X 1 ,X 2
(x1 , x2 ) = 24x1 (1 − x2 ) u (x1 ) u (x2 − x1 ) u (1 − x2 )
5
∞ ∞
More specifically, we have f X 1 (x1 ) = −∞ f X 1 ,X 2 (x1 , x2 ) d x2 = −∞ 24x1 (1 − x2 ) u (x2 − x1 )
1
u (1 − x2 ) d x2 u (x1 ) = x 24x1 (1 − x2 ) d x2 u (x1 ) u (1 − x1 ) = 12x1 (1 − x1 )2 u (x1 ) u (1 − x1 ).
1 ∞ ∞
6 More specifically, we have f X 2 (x2 ) = f X 1 ,X 2 (x1 , x2 ) d x1 = −∞ 24x1 (1 − x2 ) u (x1 ) u (x2 − x1 )
x2
−∞
d x1 u (1 − x2 ) = 0 24x1 (1 − x2 ) d x1 u (x2 ) u (1 − x2 ) = 12x2 (1 − x2 )u (x2 ) u (1 − x2 ).
2
4.2 Distributions of Functions of Random Vectors 277
of which the support is the region B shown in Fig. 4.7. Now, we can
∞
obtain the pdf f Y1 (y1 ) = −∞ f Y1 ,Y2 (y1 , y2 ) dy2 of Y1 = X 1 + X 2 as f Y1 (y1 ) =
y1 1
1
y
24 (y1 − y2 ) (1 − y2 ) dy2 for 0 ≤ y1 ≤ 1 and f Y1 (y1 ) = 1 y1 24 (y1 − y2 ) 1 −
2 1 2
from integration. ♦
Example 4.2.7 When the joint pdf of X and Y is f X,Y (x, y) = 41 u(x)u(2 −
x)u(y)u(2 − y), obtain the pdf of V = X − Y .
7 Because u (y1 − y2 ) u (2y2 − y1 ) is non-zero only when {(y1 , y2 ) : y2 < y1 < 2y2 } =
∞
{y1 : y2 < y1 < 2y2 } ∩ {y2 : y2 > 0}, we have f Y2 (y2 ) = −∞ 24 (y1 − y2 ) (1 − y2 )
2y2
u (y1 − y2 ) u (2y2 − y1 ) u (1 − y2 ) dy1 = (1 − y2 ) y2 24 (y1 − y2 ) u (1 − y2 ) u (y2 ) dy1 =
"2y
24 (1 − y2 ) 21 y12 − y2 y1 y 2 u (1 − y2 ) u (y2 ) = 12 (1 − y2 ) y22 u (1 − y2 ) u (y2 ), which is the
2
same as (4.2.17): note that we have chosen Y2 = X 2 .
278 4 Random Vectors
−2 0 y=0 2 v
i.e.,
⎧ v+2
⎨ 4 , −2 < v < 0,
f V (v) = − v−2 , 0 < v < 2, (4.2.23)
⎩ 4
0, otherwise,
for which Fig. 4.8 can be used to identify the upper and lower limits of the integration.
♦
Example 4.2.8 Obtain the pdf of Z = X Y for a random vector (X, Y ) with the joint
pdf f X,Y .
∂w ∂w
Thus, we have f Z ,W (z, w) = |w|
1
f X,Y (w, wz ). We can then obtain the pdf f Z (z) =
∞
−∞ f Z ,W (z, w)dw of Z = X Y as
∞ z 1
f Z (z) = f X,Y x, dx
−∞ x |x|
∞
z 1
= f X,Y ,y dy (4.2.24)
−∞ y |y|
after integration. ♦
Example 4.2.9 When X 1 and X 2 are independent of each other with the identical
pdf f (x) = u(x)u(1 − x), obtain the pdf of Z = X 1 X 2 .
0 1 z
e−1
Now, noting that the integral of (4.2.25) is non-zero only when (y, z) : 0 < y <
1, 0 < yz < 1 = {(y, z) : 0 < z < y < 1} = {(y, z) : z < y < 1, 0 < z < 1}, we
1
have f Z (z) = z 1y dyu(z)u(1 − z), i.e.,
from integration. ♦
Example 4.2.11 Obtain the pdf of Z = YX when X and Y are i.i.d. with the marginal
pdf f (x) = u(x)u(1 − x).
∞
Solution We have f Z (z) = −∞ |y|u(zy)u(y)u(1 − zy)u(1 − y)dy from (4.2.27).
Next,
noting that {(y, z) : zy> 0, y > 0, 1 − zy > 0, 1 − y > 0} is the same as
(y, z) : z > 0, 0 < y < min 1, 1z , the pdf of Z = YX can be obtained as f Z (z) =
min(1, 1z )
0 y dy u(z), i.e.,
⎧
⎨ 0, z < 0,
f Z (z) = 21 , 0 < z ≤ 1, (4.2.28)
⎩ 1
2z 2
, z ≥ 1.
280 4 Random Vectors
1
8
0 1 2 z
Note that the value f Z (0) is not, and does not need to be, specified. Figure 4.10 shows
the pdf (4.2.28) of Z = YX . ♦
Theorem 4.2.1 is useful when we obtain the joint pdf f Y of Y = g (X) directly from
the joint pdf f X of X. In some cases, it is more convenient and easier to obtain the
joint pdf f Y after first obtaining the joint cdf FY ( y) = P (Y ≤ y) = P (g (X) ≤ y)
as
FY ( y) = P X ∈ A y . (4.2.29)
also when w < 0. Changing the integration in the perpendicular coordinate system
into that in the polar coordinate system as indicated in Fig. 4.1
and2 noting the
θw z
symmetry of f X,Y , we get FZ ,W (z, w) = 2 θ=− π r =0 2πσ 2 exp − 2σr 2 r dr dθ =
1
# 2 $z 2
π
1
πσ 2
θw + 2 −σ exp − 2σ 2
2 r
, i.e.,
r =0
1 z2
FZ ,W (z, w) = π + 2 tan−1 w 1 − exp − 2 , (4.2.32)
2π 2σ
where θw = tan−1 w ∈ − π2 , π2 . ♦
Recollect that we obtained the total probability theorems (2.4.13), (3.4.8), and
(3.4.9) based on P(A|B)P(B) = P(AB) derived from (2.4.1). Now, extending the
results into the multi-dimensional space, we similarly8 have
⎧
⎨ all x P(A|X = x) f X (x)d x, continuous random vector X,
⎪
P(A) = (4.2.33)
⎪
⎩ P(A|X = x) p X (x), discrete random vectorX,
all x
which are useful in obtaining the cdf, pdf, and pmf in some cases.
Solution This problem has already been discussed in Example 4.2.4 based on the
pdf. We now consider the problem based on the cdf.
(Method 1) Recollecting (4.2.33), the cdf FY (y) = P (X 1 + X 2 ≤ y) of Y can be
expressed as
∞ ∞
FY (y) = P ( X 1 + X 2 ≤ y| X 1 = x1 , X 2 = x2 )
−∞ −∞
f X (x1 , x2 ) d x1 d x2 . (4.2.34)
8 Conditional distribution in random vectors will be discussed in Sect. 4.4 in more detail.
282 4 Random Vectors
x2 X1 + X2 = y
FY (y) = f X (x1 , x2 ) d x1 d x2
x1 +x2 ≤y
∞ y−x2
= f X (x1 , x2 ) d x1 d x2 . (4.2.36)
−∞ −∞
∂
Then, the pdf f Y (y) = ∂y
FY (y) of Y is
∞
f Y (y) = f X (y − x2 , x2 ) d x2 , (4.2.37)
−∞
∞
which can also be expressed as f Y (y) = −∞ f X (x1 , y − x1 ) d x1 . In obtaining
y−x
(4.2.37), we used ∂∂y −∞ 2 f X (x1 , x2 ) d x1 = f X (y − x2 , x2 ) from Leibnitz’s rule
(3.2.18).
(Method 2) Referring to the region A = {(X 1 , X 2 ) : X 1 + X 2 ≤ y} shown in
Fig. 4.11, the value x1 of X 1 runs from −∞ to y − x2 when the value of x2 of
X 2 runs from −∞ to ∞. Thus we have FY (y) = P(Y ≤ y) = P (X 1 + X 2 ≤ y) =
9
f X (x1 , x2 ) d x1 d x2 , i.e.,
A
∞ y−x2
FY (y) = f X (x1 , x2 ) d x1 d x2 , (4.2.38)
x2 =−∞ x1 =−∞
Example 4.2.14 Obtain the pdf of Z = X + Y when X with the pdf f X (x) =
αe−αx u(x) and Y with the pdf f Y (y) = βe−βy u(y) are independent of each other.
Solution We first obtain the pdf of Z directly. The joint pdf of X and Y is
f X,Y (x, y) = f X (x) f Y (y) = αβe−αx e−βy u(x)u(y). Recollecting that u(y)u(z − y)
∞ y−x2
9If the order of integration is interchanged, then x 2 =−∞ x 1 =−∞ f X (x1 , x2 ) d x1 d x2 will become
∞ y−x1
x 1 =−∞ x 2 =−∞ f X (x 1 , x 2 ) d x 2 d x 1 .
4.2 Distributions of Functions of Random Vectors 283
from (4.2.37).
Next, the cdf of Z can be expressed as
∞ z−y
FZ (z) = αβ e−αx e−βy u(x)u(y) d xd y (4.2.40)
−∞ −∞
based on (4.2.38). Here, (4.2.40) is non-zero only when {x > 0, y > 0, z − y > 0}
due to u(x)u(y). With this fact in mind and by noting that {y > 0, z − y > 0} =
{z > y > 0}, we can rewrite (4.2.40) as
z z−y
FZ (z) = αβ e−αx e−βy d xd y u(z)
0 0
z
=β 1 − e−α(z−y) e−βy dy u(z)
0 −αz
1 − β−α
1
βe − αe−βz u(z), β = α,
= (4.2.41)
1 − (1 + αz) e−αz u(z), β = α.
Example 4.2.15 For a continuous random vector (X, Y ), let Z = max(X, Y ) and
W = min(X, Y ). Referring to Fig. 4.12, we first have FZ (z) = P(max(X, Y ) ≤ z) =
P(X ≤ z, Y ≤ z), i.e.,
z X
284 4 Random Vectors
w X
which can also be obtained intuitively from Fig. 4.13. Note that the pdf f Z (z) =
d
F (z, z) of Z = max(X, Y ) becomes
dz X,Y
when X and Y are i.i.d. with the marginal cdf F and marginal pdf f . ♦
Considering that the pmf, unlike the pdf, represents a probability, we now discuss
functions of discrete random vectors.
4.2 Distributions of Functions of Random Vectors 285
Example 4.2.16 (Rohatgi and Saleh 2001) Obtain the pmf of Z = X + Y and the
pmf of W = X − Y when X ∼ b(n, p) and Y ∼ b(n, p) are independent of each
other.
n
Solution First, the pmf P(Z = z) = P(X = k, Y = z − k) of Z = X + Y
k=0
n
can be obtained as P(Z = z) = n Ck p
k
(1 − p)n−k n Cz−k p z−k (1 − p)n−z+k =
k=0
n
n Ck n Cz−k p
z
(1 − p)2n−z , i.e.,
k=0
P(Z = z) = 2n Cz p
z
(1 − p)2n−z (4.2.47)
n
for z = 0, 1, . . . , 2n, where we have used n Ck n Cz−k = 2n Cz based on (1.A.25).
k=0
n
n
Next, the pmf P(W = w) = P(X = k + w, Y = k) = n Ck+w n Ck p
2k+w
(1 −
k=0 k=0
p)2n−2k−w of W = X − Y can be obtained as
w
n
p
P(W = w) = n Ck+w n Ck p
2k
(1 − p)2n−2k (4.2.48)
1− p k=0
for w = −n, −n + 1, . . . , n. ♦
Example 4.2.17 Assume that X and Y are i.i.d. with the marginal pmf p(x) =
(1 − α)α x−1 ũ(x − 1), where 0 < α < 1 and ũ(x) is the discrete space unit step
function defined in (1.4.17). Obtain the joint pmf of (X + Y, X ), and based on the
result, obtain the pmf of X and the pmf of X + Y .
Solution First we have p X +Y,X (v, x) = P(X + Y = v, X = x) = P(X = x, Y =
v − x), i.e.,
∞
Thus, we have p X +Y (v) = p X +Y,X (v, x), i.e.,
x=−∞
∞
p X +Y (v) = (1 − α)2 α v−2 ũ(x − 1)ũ(v − x − 1) (4.2.50)
x=−∞
∞
Next, the pmf of X can be obtained as p X (x) = p X +Y,X (v, x), i.e.,
v=−∞
∞
p X (x) = (1 − α)2 α −2 α v ũ(x − 1)ũ(v − x − 1) (4.2.52)
v=−∞
∞
from (4.2.49), which can be rewritten as p X (x) = (1 − α)2 α −2 α v ũ(x − 1),
v=x+1
i.e.,
For random vectors, we will describe here the basic properties (Balakrishnan 1992;
Kendall and Stuart 1979; Samorodnitsky and Taqqu 1994) of expected values. New
notions will also be defined and explored.
The expected values for random vectors can be described as, for example,
by extending the notion of the expected values discussed in Chap. 3 into multiple
dimensions. Because the expectation is a linear operator, we have
n
n
E ai gi (X i ) = ai E {gi (X i )} (4.3.2)
i=1 i=1
Example 4.3.1 Assume that we repeatedly roll a fair die until the number of even-
numbered outcomes is 10. Let N denote the number of rolls and X i denote the
number of outcome i when the repetition ends. Obtain the pmf of N , expected value
of N , expected value of X 1 , and expected value of X 2 .
for k = 10, 11, . . ., where Ak = {an even number at the k-th rolling} and B =
∞
{9 times of even numbers until (k − 1)-st rolling}. Using r +x−1 Cx (1 − α) =
x
x=0
(k−1)!
α −r shown in (2.5.16) and noting that k k−1 C9 = k (k−10)!9! = 10 (k−10)!10!
k!
= 10 k C10 ,
∞ 1 k
∞ 1 k
∞ 1 j
we get E{N } = k−1 C9 2 k = 10 k C10 2 = 21010 j+10 C10 2 =
k=10 k=10 j=0
1 −11
10
= 20. This result can also be obtained from the formula (3.E.27) of
210 2
the mean of the NB distribution with the pmf (2.5.17) by using (r, p) = 10, 21 .
Subsequently, until the end, even numbers will occur 10 times, among which 2, 4,
6
and 6 will occur equally likely. Thus, E {X 2 } = 10
3
. Next, from N = X i , we get
i=1
6
E{N } = E {X i }. Here, because E {X 2 } = E {X 4 } = E {X 6 } = 10
3
and E {X 1 } =
i=1
E {X 3 } = E {X 5 }, the expected value11 of X 1 is E {X 1 } = 20 − 3 × 10
3
1
3 = 10
3 . ♦
We now generalize the concept of moments discussed in Chap. 3 for random vectors.
The moments for bi-variate random vectors will first be considered and then those
for higher dimensions will be discussed.
11The expected values of X 1 and X 2 can of course be obtained with the pmf’s of X 1 and X 2
obtained already in Example 4.1.11.
288 4 Random Vectors
Definition 4.3.1 (joint moment; joint central moment) The expected value
m jk = E X j Y k (4.3.5)
is termed the ( j, k)-th joint central moment or product central moment of X and Y ,
for j, k = 0, 1, . . ., where m X and m Y are the means of X and Y , respectively.
It is easy to see that m 00= μ00 = 1, m 10 = m X = E{X }, m 01 = m Y = E{Y },
m 20 = E X 2 , m 02 = E Y 2 , μ10 = μ01 = 0, μ20 = σ X2 is the variance of X , and
μ02 = σY2 is the variance of Y .
Example 4.3.2 The expected value E X 1 X 23 is the (1, 3)-rd joint moment of X =
(X 1 , X 2 ). ♦
Definition 4.3.2 (correlation; covariance) The (1, 1)-st joint moment m 11 and the
(1, 1)-st joint central moment μ11 are termed the correlation and covariance, respec-
tively, of the two random variables. The ratio of the covariance to the product of the
standard deviations of two random variables is termed the correlation coefficient.
The correlation m 11 = E{X Y } is often denoted12 by R X Y , and the covariance
μ11 = E {(X − m X ) (Y − m Y )} = E{X Y } − m X m Y by K X Y , Cov(X, Y ), or C X Y .
Specifically, we have
K XY = RXY − m X mY (4.3.7)
K XY
ρX Y = % (4.3.8)
σ X2 σY2
12When there is more than one subscript, we need commas in some cases: for example, the joint
pdf f X,Y of (X, Y ) should be differentiated from the pdf f X Y of the product X Y . In other cases, we
do not need to use commas: for instance, R X Y , μ jk , K X Y , . . . denote relations among two or more
random variables and thus is expressed without any comma.
4.3 Expected Values and Joint Moments 289
Theorem 4.3.1 If two random variables are independent of each other, then they
are uncorrelated, but the converse is not necessarily true.
In other words, there exist some uncorrelated random variables that are not inde-
pendent of each other. In addition, when two random variables are independent and
at least one of them has mean 0, the two random variables are orthogonal.
Proof From the Cauchy-Schwarz inequality E2 {X Y } ≤ E X 2 E Y 2 shown in
(6.A.26), we get
E2 {(X − m X ) (Y − m Y )} ≤ E (X − m X )2 E (Y − m Y )2 , (4.3.9)
Example 4.3.3 When the two random variables X and Y are related by Y − m Y =
c (X − m X ) or Y = cX + d, we have |ρ X Y | = 1. ♦
K X = R X − m X m XH (4.3.11)
Y = L X, (4.3.12)
T
where Y = (Y1 , Y2 , . . . , Yn )T . Then, letting m X = m X 1 , m X 2 , . . . , m X n be the
T
mean vector of X, the mean vector mY = E {Y } = m Y1 , m Y2 , . . . , m Yn =
E {L X} = LE {X} of Y can be obtained as
mY = L m X . (4.3.13)
RY = L R X L H , (4.3.14)
and the covariance matrix K Y = E (Y − mY ) (Y − mY ) H = R Y − mY mYH =
L R X − m X m XH L H of Y can be expressed as
KY = L K X LH. (4.3.15)
X L Y = LX
{mX , RX , K X } mY = L mX , RY = L RX LH ,
K Y = L K X LH
Fig. 4.14 The mean vector, correlation matrix, and covariance matrix of linear transformation
Theorem 4.3.4 The correlation and covariance matrices of any random vector are
positive semi-definite.
i = j.
and
1, k = 0,
δk = (4.3.18)
0, k = 0
are called the Kronecker delta function. In some cases, an uncorrelated random vector
is referred to as a linearly independent random vector. In addition, a random vector
X = (X 1 , X 2 , . . . , X n ) is called a linearly dependent random vector if there exists a
vector a = (a1 , a2 , . . . , an ) = 0 such that a1 X 1 + a2 X 2 + · · · + an X n = 0.
Definition
4.3.6 (random
vectors uncorrelated with each other) When we have
E X i Y j∗ = E {X i } E Y j∗ for all i and j, the random vectors X and Y are called
uncorrelated with each other.
Note that even when X and Y are uncorrelated with each other, each of X and Y
may or may not be an uncorrelated random vector, and even when X and Y are both
uncorrelated random vectors, X and Y may be correlated.
Theorem 4.3.5 (McDonough and Whalen 1995) If the covariance matrix of a ran-
dom vector is positive definite, then the random vector can be transformed into an
uncorrelated random vector via a linear transformation.
aiH a j = δi j . (4.3.20)
A = (a1 , a2 , . . . , an ) H
"
= ai∗j (4.3.21)
X λ̃A Z = λ̃AX
K X , |K X | > 0, {λi }n
i=1
λ̃ = diag √1 , √1 , · · · , √1 KZ = I
λ1 λ2 λn
K X ai = λ i ai
A = (a1 , a2 , · · · , an )H
aH
i aj = δij
K Y = (a1 , a2 , . . . , an ) H (λ1 a1 , λ2 a2 , . . . , λn an )
"
= λi aiH a j
= diag (λ1 , λ2 , . . . , λn ) (4.3.22)
Let us proceed one step further from Theorem 4.3.5. Recollecting that the eigen-
values {λi }i=1
n
of the covariance matrix K X are all larger than 0, let
1 1 1
λ̃ = diag √ , √ , . . . , √ (4.3.23)
λ1 λ2 λn
Z = λ̃Y (4.3.24)
H
of Y . Then, the covariance matrix K Z = λ̃K Y λ̃ of Z is
K Z = I. (4.3.25)
Solution
From the characteristic equation
|λI − K X | = 0, we get the two
pairs λ1 = 25, a1 = √2 (1 1)
1 T
and λ2 = 1, a2 = √12 (1 − 1)T of eigen-
value
& and unit
' eigenvector of K X . With the linear transformation C = λ̃ A =
√1
0 1 1 1 1
√1 25 = 50√1
constructed from the two pairs, the covari-
2 0 √11 1 −1 5 −5
294 4 Random Vectors
1 1 25 5
ance matrix K W = C K X C H of W = C X is K W = 50 1
=
5 −5 25 −5
10
. In other words, C is a linear transformation that decorrelates X into an
01
1 1
uncorrelated unit-variance random vector. Note that A = 2 √1
is a unitary
1 −1
matrix.
−2 3
Meanwhile, for B = 5 1
, the covariance matrix of Y = B X is K Y =
3 −2
−2 3 10 15 10
B K X B H = 25 1
= . In other words, like C, the trans-
3 −2 15 10 01
formation B also decorrelates X into an uncorrelated unit-variance random vec-
tor.In addition, from C K X C H = I and B K X B H = I, we get C H C = B H B =
13 −12
1
= K −1
X . ♦
25 −12 13
By extending the notion of the cf and mgf discussed in Sect. 3.3.4, we introduce and
discuss the joint cf and joint mgf of multi-dimensional random vectors.
Definition 4.3.7 (joint cf) The function
ϕ X (ω) = E exp jω T X (4.3.26)
n
ϕ X (ω) = ϕ X i (ωi ) (4.3.31)
i=1
of marginal cf’s from Theorem 4.1.3. Because cf’s and distributions are related by
one-to-one correspondences as we discussed in Theorem 3.3.2, a random vector
whose joint cf is the product of the marginal cf’s is an independent random vector.
n
Example 4.3.8 For an independent random vector X, let Y = X i . Then, the cf
jωY jωX jωX i=1
ϕY (ω) = E e = E e 1 e 2 · · · e jωX n of Y can be expressed as
n
ϕY (ω) = ϕ X i (ω). (4.3.32)
i=1
By inverse transforming the cf (4.3.32), we can get the pdf f Y (y) of Y , which is the
convolution
296 4 Random Vectors
f X 1 +X 2 +···+X n = f X 1 ∗ f X 2 ∗ · · · ∗ f X n (4.3.33)
of the marginal pdf’s. The result (4.2.15) is a special case of (4.3.33) with
n = 2. ♦
n
n
Example 4.3.9 Show that Y = Xi ∼ P λi when {X i ∼ P (λi )}i=1
n
are
i=1 i=1
independent of each other.
Solution The cf for the distribution P (λi ) is ϕ X i (ω) = exp λi e jω − 1 . Thus,
n n
the cf ϕY (ω) = exp λi e jω − 1 of Y = X i can be expressed as
i=1 i=1
& '
n
jω
ϕY (ω) = exp λi e −1 (4.3.34)
i=1
Theorem 4.3.6 When X and Y are independent, g(X ) and h(Y ) are uncorrelated.
However, when X and Y are uncorrelated but not independent, g(X ) and h(Y ) are
not necessarily uncorrelated.
Proof
∞ When
∞
X and Y are independent of ∞ E {g(X )h(Y )} =
∞each other, we have
−∞ −∞ g(x)h(y) f X,Y (x, y)d xd y = −∞ g(x) f X (x)d x −∞ h(y) f Y (y)dy =
E{g(X )}E{h(Y )} and thus g(X ) and h(Y ) are uncorrelated. Next, when X and
Y are uncorrelated but are not independent of eachother,assume g(X) and
that
h(Y ) are uncorrelated. Then, we have E e j (ω1 X +ω2 Y ) = E e jω1 X E e jω2 Y from
E{g(X )h(Y )} = E{g(X )}E{h(Y )} with g(x) = e jω1 x and h(y) = e jω2 y . This result
implies
1 ,ω2 ) ofX and Y can be expressed as ϕ X,Y (ω1 , ω2 ) =
that thejoint cf ϕ X,Y (ω
E e j (ω1 X +ω2 Y ) = E e jω1 X E e jω2 Y = ϕ X (ω1 ) ϕY (ω2 ) in terms of the marginal
cf’s of X and Y , a contradiction that X and Y are not independent of each other.
In short, we have E{X Y } = E{X }E{Y } E{g(X )h(Y )} = E{g(X )}E{h(Y )}, i.e.,
when X and Y are uncorrelated but are not independent of each other, g(X ) and
h(Y ) are not necessarily uncorrelated. ♠
The joint moments of random vectors can be easily obtained by using the joint cf
or joint mgf as shown in the following theorem:
∞ ∞
Theorem 4.3.7 The joint moment m k1 k2 ···kn = E X 1k1 X 2k2 · · · X nkn = −∞ −∞ · · ·
∞ k1 k2
−∞ x 1 x 2 · · · x n f X (x)d x can be obtained as
kn
∂ K ϕ X (ω)
−K
m k1 k2 ···kn = j (4.3.36)
∂ω1k1 ∂ω2k2 · · · ∂ωnkn ω=0
n
from the joint cf ϕ X (ω), where K = ki .
i=1
From the joint mgf M X (t), we can obtain the joint moment m k1 k2 ···kn also as
∂ K M X (t)
m k1 k2 ···kn = k1 k2 kn
. (4.3.37)
∂t1 ∂t2 · · · ∂tn t=0
As a special case of (4.3.37) for the two-dimensional random vector (X, Y ), we have
∂ k+r M X,Y (t1 , t2 )
m kr = , (4.3.38)
∂t1k ∂t2r (t1 ,t2 )=(0,0)
Example 4.3.11 (Romano and Siegel 1986) When the two functions G and H are
equal, we have F ∗ G = F ∗ H . The converse does not always hold true, i.e., F ∗
G = F ∗ H does not necessarily imply G = H . Let us consider an example. Assume
⎧1
⎨ 2, x = 0,
P(X = x) = 2
π 2 (2n−1)2
, x = ±(2n − 1)π, n = 1, 2, . . . , (4.3.39)
⎩
0, otherwise
∞
for a random variable X . Then, the cf ϕ X (t) = 1
2
+ 4
π 2 (2n−1)2
cos{(2n − 1)π t} of
n=1
X is a train of triangular pulses with period 2 and
for |t| ≤ 2. It is easy to see that ϕ X (t) = ϕY (t) for |t| ≤ 1 and that |ϕ X (t)| = |ϕY (t)|
for all t from (4.3.40) and (4.3.42). Now, for a random variable Z with the pdf
1−cos x
, x = 0,
f Z (x) = π x2 (4.3.43)
1
2π
, x = 0,
we have the cf ϕ Z (t) = (1 − |t|)u(1 − |t|). Then, we have ϕ Z (t)ϕ X (t) = ϕ Z (t)ϕY (t)
and FZ (x) ∗ FX (x) = FZ (x) ∗ FY (x), but FX (x) = FY (x), where FX , FY , and FZ
denote the cdf’s of X , Y , and Z , respectively. ♦
In this section, we discuss conditional probability functions (Ross 2009) and condi-
tional expected values mainly for bi-variate random vectors.
4.4 Conditional Distributions 299
We first extend the discussion on the conditional distribution explored in Sect. 3.4.
When the event A is assumed, the conditional joint cdf16 FZ ,W |A (z, w) = P(Z ≤
z, W ≤ w| A) of Z and W is
P(Z ≤ z, W ≤ w, A)
FZ ,W |A (z, w) = . (4.4.1)
P(A)
Differentiating the conditional joint cdf (4.4.3) with respect to x and y, we get the
conditional joint pdf f X,Y |A (x, y) as
∂ 1 ∂
f X,Y |X ≤x (x, y) = FX,Y (x, y)
∂ x FX (x) ∂ y
f X,Y (x, y) f X (x) ∂
= − 2 FX,Y (x, y). (4.4.4)
FX (x) FX (x) ∂ y
FX,Y (x, y)
FY |X ≤x (y) = , (4.4.5)
FX (x)
we get
16 As in other cases, the conditional joint cdf is also referred to as the conditional cdf if it does not
cause any ambiguity.
17 The conditional joint pdf is also referred to as the conditional pdf if it does not cause any ambiguity.
300 4 Random Vectors
Definition 4.4.1 (conditional cdf; conditional pdf; conditional pmf) For a random
vector (X, Y ), P(Y ≤ y|X = x) is called the conditional joint cdf, or simply the
conditional cdf, of Y given X = x, and is written as FX,Y |X =x (x, y), FY |X =x (y),
or FY |X (y|x). For a continuous random vector (X, Y ), the derivative ∂∂y FY |X (y|x),
denoted by f Y |X (y|x), is called the conditional pdf of Y given X = x. For a discrete
random vector (X, Y ), P(Y = y|X = x) is called the conditional joint pmf or the
conditional pmf of Y given X = x and is written as p X,Y |X =x (x, y), pY |X =x (y), or
pY |X (y|x).
The relationships among the conditional pdf f Y |X (y|x), joint pdf f X,Y (x, y), and
marginal pdf f X (x) and those among the conditional pmf pY |X (y|x), joint pmf
p X,Y (x, y), and marginal pmf p X (x) are described in the following theorem:
p X,Y (x, y)
pY |X (y|x) = (4.4.7)
p X (x)
when (X, Y ) is a discrete random vector. Similarly, the conditional pdf f Y |X (y|x)
can be expressed as
f X,Y (x, y)
f Y |X (y|x) = (4.4.8)
f X (x)
∂ ∂2
and consequently, f Y |X (y|x) = ∂y
FY |X (y|x) = 1
f X (x) ∂ x∂ y
FX,Y (x, y), which is the
same as (4.4.8). ♠
4.4 Conditional Distributions 301
∂
We can similarly obtain the conditional pdf f X |Y (x|y) = ∂x
FX |Y (x|y) as
f X,Y (x, y)
f X |Y (x|y) = , (4.4.10)
f Y (y)
which can also be obtained directly from (4.4.8) by replacing X , Y , x, and y with Y ,
X , y, and x, respectively. Note the similarity among (4.4.7), (4.4.8), and (2.4.1).
Example 4.4.2 (Ross 1976) Obtain the conditional pdf f X |Y (x|y) when the joint
pdf of (X, Y ) is f X,Y (x, y) = 6x y(2 − x − y)u(x)u(1 − x)u(y)u(1 − y).
∞
Solution Employing (4.4.10) and noting that f Y (y) = −∞ f X,Y (x, y)d x =
1
0 6x y(2 − x − y)d x, we have f X |Y (x|y) = 1
6x y(2−x−y)
, i.e.,
0 6x y(2−x−y)d x
6x(2 − x − y)
f X |Y (x|y) = (4.4.11)
4 − 3y
Example 4.4.3 (Ross 1976) Obtain the conditional pdf f X |Y (x|y) when the joint
pdf of (X, Y ) is f X,Y (x, y) = 4y(x − y)e−(x+y) u(y)u(x − y).
−(x+y)
Solution We easily get f X |Y (x|y) = ∞4y(x−y)e
4y(x−y)e −(x+y) d x , i.e.,
y
FX,Y (x, y)
FY |X (y|x) = (4.4.13)
FX (x)
(x,y) ≤x,Y ≤y)
from FY |X (y|x) = P(Y ≤ y|X = x) and FX,Y = P(XP(X = P(Y ≤ y|X ≤
yFX (x) ≤x)
x). Employing (4.4.8) with FY |X (y|t) = −∞ f Y |X (s|t)ds, we get FX,Y (x, y) =
x x y x y
−∞ FY |X (y|t) f X (t)dt = −∞ −∞ f Y |X (s|t)ds f X (t)dt = −∞ −∞ f X,Y (t, s)dsdt
from (4.4.9). This result is the same as (4.1.10) or (4.1.21) in that the cdf can be
obtained by integrating the pdf.
for a discrete random vector (X, Y ), and the pdf f X of X can be expressed as
302 4 Random Vectors
∞
f X (x) = f X |Y (x|y) f Y (y)dy (4.4.15)
−∞
f X |Y (x|y) f Y (y)
f Y |X (y|x) = ∞ (4.4.16)
−∞ f X |Y (x|y) f Y (y)dy
for any x such that f X (x) > 0 by noting that f X,Y (x, y) = f X |Y (x|y) f Y (y) and
∞
f X (x) = −∞ f X |Y (x|y) f Y (y)dy.
p X |Y (x|y) pY (y)
pY |X (y|x) = (4.4.17)
∞
p X |Y (x|y) pY (y)
y=−∞
for any x such that p X (x) > 0 when (X, Y ) is a discrete random vector.
When X and Y are independent of each other, we have
for every point y such that f Y (y) > 0, FY |X (y|x) = FY (y) for every point x such that
FX (x) > 0, and f Y |X (y|x) = f Y (y) for every point x such that f X (x) > 0 because
FX,Y (x, y) = FX (x)FY (y) and f X,Y (x, y) = f X (x) f Y (y).
of X . Consider
4.4 Conditional Distributions 303
0, if X = 4,
Y = (4.4.21)
1, if X = 3 or 5
Noting that p X,Y (x, y) = pY |X (y|x) p X (x) from (4.4.17) and using (4.4.20) and
(4.4.24), we get
p X,Y (x,y)
Similarly, noting that p X |Y (x|y) = pY (y)
from (4.4.17) and using (4.4.22) and
(4.4.25), we get
Noting that p X,Z (x, z) = p Z |X (z|x) p X (x) from (4.4.17) and using (4.4.20) and
(4.4.27), we get
304 4 Random Vectors
p X,Z (x,z)
Similarly, noting that p X |Z (x|z) = p Z (z)
from (4.4.17) and using (4.4.23) and
(4.4.28), we get
1
p Z |Y (2|1) = (4.4.30)
3
P(X =5)
and p Z |Y (4|1) = P(X − Y = 4|Y = 1) = P(X =3 or 5)
as
2
p Z |Y (4|1) = (4.4.31)
3
We also get
pY,Z (y,z)
from pY |Z (y|z) = p Z (z)
by using (4.4.23) and (4.4.32). ♦
1
f X,Y (x, y) = u (2 − |x|) u (2 − |y|) , (4.4.34)
16
obtain the conditional joint cdf FX,Y |A and the conditional joint pdf f X,Y |A for A =
{|X | ≤ 1, |Y | ≤ 1}.
1 1
Solution First, we have P(A) = −1 −1 f X,Y (u, v)dudv = 16 1
× 4 = 41 . Next, for
−2 −1 1 2 u
−1
−2
1
f X,Y |A (x, y) = u(1 − |x|)u(1 − |y|) (4.4.37)
4
by differentiating FX,Y |A (x, y). ♦
Let Y = g(X) and g −1 (·) be the inverse of the function g(·). We then have
f Z|Y (z|Y = y) = f Z|X z|X = g −1 ( y) , (4.4.38)
which implies that, when the relationship between X and Y can be expressed via
an invertible function, conditioning on X = g −1 ( y) is equivalent to conditioning on
Y = y.
306 4 Random Vectors
For one random variable, we have discussed the conditional expected value in
(3.4.30). We now extend the discussion into random vectors with the conditioning
event expressed in terms of random variables.
when X = x.
Example 4.4.7 Obtain the conditional expected value E{X |Y = y} in Example
4.4.2.
1 2
Solution We easily get E{X |Y = y} = 0 6x (2−x−y)
4−3y
d x = 4−3y
1
2(2 − y)x 3
− 3 x4
1
2
= 5−4y for 0 < y < 1.
x=0 8−6y
♦
E {E{Y |X }} = E{Y }
⎧∞
⎪
⎪
⎪ −∞ E{Y |X = x} f X (x)d x,
⎪
⎨ X is a continuous random variable,
= ∞
(4.4.40)
⎪
⎪ E{Y |X = x} p X (x),
⎪ x=−∞
⎪
⎩
X is a discrete random variable
E { g(X, Y )| X = x} = E { g (x, Y )| X = x}
= g(x, y) f Y |X ( y| x) d y. (4.4.41)
all y
Furthermore, for the expected value of the random vector E { g(X, Y )| X}, we
have
which is called the factorization property (Gardner 1990). The factorization property
implies that the random vector g 1 (X) under the condition X = x, or equivalently
g 1 (x), is not probabilistic.
As we have observed in Sects. 2.4 and 3.4.3, we can obtain the probability and
expected value more easily by first obtaining the conditional probability and condi-
tional expected value with appropriate conditioning. Let us now discuss how we can
obtain expected values for random vectors by first obtaining the conditional expected
value with appropriate conditioning on random vectors.
Solution Denote by Jm the number of pairs remaining after m numbers have been
deleted. After m numbers have been deleted, we have 2n − m − 2Jm non-paired
numbers. When we delete one more number, the number of pairs is Jm − 1 with
2Jm
probability 2n−m or Jm with probability 1 − 2n−m
2Jm
. Based on this observation, we
have
2Jm 2Jm
E { Jm+1 | Jm } = (Jm − 1) + Jm 1 −
2n − m 2n − m
2n − m − 2
= Jm . (4.4.45)
2n − m
Noting that E E Jm+1 Jm = E Jm+1 and E {J0 } = n , we get E {Jm+1 } =
E {Jm } 2n−m−2
2n−m
= E {Jm−1 } 2n−m−2 2n−m−1
2n−m 2n−m+1
= · · · = E {J0 } 2n−m−2 2n−m−1
2n−m 2n−m+1
· · · 2n−2
2n
=
(2n−m−2)(2n−m−1)
2(2n−1)
, i.e.,
(2n − m − 1)(2n − m)
E {Jm } =
2(2n − 1)
C
2n−2 m
=n . (4.4.46)
2n Cm
Example 4.4.10 (Ross 1996) We toss a coin repeatedly until head appears r times
consecutively. When the probability of a head is p, obtain the expected value of
repetitions.
4.4 Conditional Distributions 309
we get
k
1
αk = . (4.4.50)
i=1
pi
A generalization of this problem, finding the mean time for a pattern, is discussed in
Appendix 4.2. ♦
As we have observed in Examples 2.5.23 and 3.1.34, the unit step and impulse
functions are quite useful in representing the cdf and pdf of discrete and hybrid
random variables. In addition, the unit step and impulse functions can be used for
obtaining joint cdf’s and joint pdf’s expressed in several formulas depending on the
condition.
Example 4.5.1 Obtain the joint cdf FX,X +a and joint pdf f X,X +a of X and Y =
X + a for a random variable X with pdf f X and cdf FX , where a is a constant.
Example 4.5.2 Obtain the joint cdf FX,cX and the joint pdf f X,cX of X and Y = cX
for a continuous random variable X with pdf f X and cdf FX , where c is a constant.
Solution For c > 0, the joint cdf FX,cX (x, y) = P(X ≤ x, cX ≤ y) of X and Y =
cX can be obtained as
FX,cX (x, y) =
P(X ≤ x), x ≤ cy ,
P X ≤ c , x ≥ cy
y
y y y
= FX (x)u − x + FX u x− . (4.5.3)
c c c
For c < 0, the joint cdf is FX,cX (x, y) = P X ≤ x, X ≥ cy , i.e.,
0, x ≤ cy ,
FX,cX (x, y) =
P c ≤ X ≤ x , x > cy
y
y y
= FX (x) − FX u x− . (4.5.4)
c c
∞ ∞ ∞
18 Here, −∞ −∞ f X (x)δ(y − x − a)dydx = −∞ f X (x)d x = 1.
4.5 Impulse Functions and Random Vectors 311
⎧
⎪
⎪ FX (x) − FX cy u x − y
, c < 0,
⎨ c
FX (x)u(y),
y c = 0,
FX,cX (x, y) = (4.5.6)
⎪
⎪ F X (x)u c −x
⎩
+FX cy u x − cy , c > 0.
1 y
f X,cX (x, y) = f X (x)δ −x (4.5.8)
c c
∂2
for c > 0, and f X,cX (x, y) = ∂ x∂ y
FX (x)u(y) = f X (x)δ(y) for c = 0: in short,
f X (x)δ(y),
c = 0,
f X,cX (x, y) = (4.5.9)
1
f (x)δ x −
|c| X
y
c
, c = 0.
∞ ∞ ∞ ∞
Note that −∞ −∞ f X,cX (x, y)d yd x = −∞ f X (x)d x −∞ δ(y)dy = 1 for c = 0.
∞ ∞ ∞ ∞
For c = 0, we have −∞ −∞ f X,cX (x, y)d yd x = |c| 1
−∞ f X (x) −∞ δ x − c
y
∞ −∞
−∞ δ x − c dy = c ∞ δ(x − t)dt = −c for c < 0,
y
d yd x. From and
∞ ∞ ∞
δ x − y
dy
c
= c δ(x − t)dt = c for c > 0, we have δ x − y
dy =
∞ ∞
−∞ −∞ −∞ c
|c|. Therefore, −∞ −∞ f X,cX (x, y)d yd x = 1. ♦
Example 4.5.3 Obtain the joint cdf FX,|X | and the joint pdf f X,|X | of X and Y = |X |
for a continuous random variable X with pdf f X and cdf FX .
312 4 Random Vectors
where
satisfies G 1 (x, −x) = 0. We have used u(y + x)u(y − x)u(y) = u(y − x)u(y + x)
in obtaining (4.5.11). Note that FX,|X | (x, y) = 0 for y < 0 because u(x + y)u(y −
x) = 0 from (x + y)(y − x) = y 2 − x 2 < 0.
Noting that
G 1 (x, −x) = 0, δ(x − y) = δ(y − x), and u(αx) = u(x) for α > 0, we get
∂
FX,|X | (x, y) = δ(x + y)u(y − x)G 1 (x, y) − u(y + x)δ(y − x)G 1 (x, y)
∂x
+u(x + y)u(y − x) f X (x) + u(y)δ(x − y)G 1 (y, y)
= δ(x + y)u(−2x)G 1 (x, −x) − u(2x)δ(y − x)G 1 (x, x)
+u(x + y)u(y − x) f X (x) + u(x)δ(x − y)G 1 (x, x)
= u(x + y)u(y − x) f X (x) (4.5.14)
∞
f |X | (y) = {δ(x + y) f X (−y) + δ(x − y) f X (y)} u(y)d x
−∞
= { f X (−y) + f X (y)} u(y), (4.5.16)
which is equivalent
∞ to that obtained in Example 3.2.7. From u(x) + u(−x) = 1, we
also have −∞ f X,|X | (x, y)dy = f X (x)u(−x) + f X (x)u(x) = f X (x). ♦
4.5 Impulse Functions and Random Vectors 313
with a and c constants. First, we choose an appropriate auxiliary variable, for exam-
ple, Z = X 2 or Z = X 1 when g1 (X) is not or is, respectively, in the form of d X 2 + b.
Then, using Theorem 4.2.1, we obtain the joint pdf f Y1 ,Z of (Y1 , Z ). We next obtain
the pdf of Y1 as
∞
f Y1 (x) = f Y1 ,Z (x, v)dv (4.5.19)
−∞
x2 z
1 1
VX
VY
0 1 x1 −1 0 1 y
Fig. 4.17 The support VY of f X 1 −X 2 ,X 2 (y, z) when the support of f X (x1 , x2 ) is VX . The intervals
of integrations in the cases −1 < y < 0 and 0 < y < 1 are also represented as lines with two arrows
∞
We can then obtain the pdf f Y1 (y) = −∞ u(y + z)u(1 − y − z)u(z)u(1 − z)dz of
Y1 = g1 (X) = X 1 − X 2 as
⎧
⎪
⎨ 0, |y| > 1,
1
f Y1 (y) = −y dz, −1 < y < 0,
⎩ 1−y dz, 0 < y < 1
⎪
⎧ 0
⎨ 0, |y| > 1,
= 1 + y, −1 < y < 0,
⎩
1 − y, 0 < y < 1
= (1 − |y|)u(1 − |y|) (4.5.22)
by integrating f Y1 ,Z (y, z), for which Fig. 4.17 is useful in identifying the integration
intervals.
Next, using (4.5.20), we get the joint pdf
1 y
f Y1 ,Y2 (x, y) = (1 − |x|)u(1 − |x|)δ x − (4.5.23)
2 2
of (Y1 , Y2 ) = (X 1 − X 2 , 2X 1 − 2X 2 ) and the joint pdf
∞ δ(t)(−2dt) = 2 −∞ δ(t)dt = 2. ♦
We have briefly discussed how we can obtain the joint pdf and joint cdf in some
special cases by employing the unit step and impulse functions. This approach is
also quite fruitful in dealing with the order statistics and rank statistics.
Appendices 315
Appendices
Let us discuss in more detail the multinomial random variables introduced in Exam-
ple 4.1.4.
Definition 4.A.1 (multinomial distribution) Assume n repetitions of an independent
experiment of which the outcomes are a collection {Ai }ri=1 of disjoint events with
r
probability {P ( Ai ) = pi }ri=1 , where pi = 1. Denote by X i the number of occur-
i=1
rences of event Ai . Then, the joint distribution of X = (X 1 , X 2 , . . . , X r ) is called
the multinomial distribution, and the joint pmf of X is
n!
p X (k1 , k2 , . . . , kr ) = p k1 p k2 · · · prkr (4.A.1)
k1 !k2 ! · · · kr ! 1 2
r
for {ki ∈ {0, 1, . . . , n}}ri=1 and ki = n.
i=1
r
k
The right-hand side of (4.A.1) is the coefficient of t j j in the multinomial
j=1
expansion of ( p1 t1 + p2 t2 + · · · + pr tr )n .
Example 4.A.1 In a repetition of rolling of a fair die ten times, let {X i }i=1
3
be the
numbers of A1 = {1}, A2 = {an even number}, and A3 = {3, 5}, respectively. Then,
the joint pmf of X = (X 1 , X 2 , X 3 ) is
k1 k2 k3
10! 1 1 1
p X (k1 , k2 , k3 ) = (4.A.2)
k1 !k2 !k3 ! 6 2 3
3
for {ki ∈ {0, 1, . . . , 10}}i=1
3
such that ki = 10. Based on this pmf, the prob-
i=1
ability of the event {three times of A1 , six times of A2 } = {X 1 = 3, X 2 = 6, X 3 =
1 3 1 6 1 1
1} can be obtained as p X (3, 6, 1) = 3!6!1!
10!
6 2 3
= 1728
35
≈ 2.025 × 10−2 . ♦
r
−1
where λ = λi . Based on these results, we can show that p X (k1 , k2 , . . . , kr ) =
i=1
k1 k2 k1 k2
···n kr −1 k1 k2 kr −1
n!
p
k1 !k2 !···kr ! 1 2
p · · · prkr → nk1 !kn 2 !···k r −1 !
p1 p2 · · · pr −1 exp −λ , i.e.,
r −1
λiki
p X (k1 , k2 , . . . , kr ) → e−λi . (4.A.3)
i=1
ki !
The result (4.A.3) with r = 2 is clearly the same as (3.5.19) obtained in Theorem
3.5.2 for the binomial distribution. ♦
√
Example 4.A.3 For ki = npi + O n and n → ∞, the multinomial pmf can be
approximated as
n!
p X (k1 , k2 , . . . , kr ) = p k1 p k2 · · · prkr
k1 !k2 ! · · · kr ! 1 2
1 (ki − npi )2
r
1
≈ exp − . (4.A.4)
(2π n)r −1 p1 p2 · · · pr 2 i=1 npi
which is the same as (3.5.16) of Theorem 3.5.1 for the binomial distribution. ♦
n− s kai
s
1−
i=1
pai ka
i=1
s
pai i
p X s k a1 , k a2 , . . . , k as = n! . (4.A.6)
s k !
n− kai ! i=1 ai
i=1
Appendices 317
r −1
where i is not equal to any of b j j=1 . It is also known that we have the conditional
expected value
n − X j pi
E Xi | X j = , (4.A.8)
1 − pj
Denote by X k the outcome of the k-th trial of an experiment with the pmf
p X k ( j) = p j (4.A.10)
∞
for j = 1, 2, . . ., where p j = 1. The number of trials of the experiment until
j=1
a pattern M = (i 1 , i 2 , . . . , i n ) is observed for the first time is called the time to
pattern M, which is denoted by T = T (M) = T (i 1 , i 2 , . . . , i n ). For example, when
the sequence of the outcomes is (6, 4, 9, 5, 5, 9, 5, 7, 3, 2, . . .), the time to pattern
(9, 5, 7) is T (9, 5, 7) = 8. Now, let us obtain (Nielsen 1973) the mean time E{T (M)}
for the pattern M.
First, when M satisfies
318 4 Random Vectors
∞
∞ k−1
First, the mean time E{T (i 1 )} = kP(T (i 1 ) = k) = k 1 − pi1 pi1 to pat-
k=1 k=1
tern i 1 of length 1 is
1
E{T (i 1 )} = . (4.A.12)
pi1
When M has J overlapping pieces, let the lengths of the overlapping pieces be
K 0 < K 1 < · · · < K J < K J +1 with K 0 = 0 and K J +1 = n, and express M as M =
i 1 , i 2 , . . . , i K 1 , i K 1 +1 , . . . , i K 2 , i K 2 +1 , . . . , i K J , i K J +1 , . . . , i n−1 , i n . If we write the
J
overlapping pieces L K j = i 1 , i 2 , . . . , i K j j=1 as
i1 i2 · · · i K1
i1 i2 · · · i K 2,1 +1 i K 2,1 +2 · · · i K 2
.. .. .. .. .. (4.A.13)
. . . . .
i 1 i 2 · · · i K J,2 +1 i K J,2 +2 · · · i K J,1 +1 i K J,1 +2 · · · i K J ,
∞
E {T (A1 )} = E {T (A1 )| X n+1 = x} P (X n+1 = x) . (4.A.15)
x=1
from which we can get E T (A1 ) X n+1 = i K j +1 = 1 + E T (M+1 ) L̃ K j +1 . We
can similarly get
⎧
⎪
⎪
⎪ 1 + E T (M +1 ) L̃ K +1 , x = i K 0 +1 ,
⎪
⎪ 0
⎪
⎪
⎪
⎪ 1 + E T (M+1 ) L̃ K 1 +1 , x = i K 1 +1 ,
⎪
⎨ .. ..
E {T (A1 )| X n+1 = x} = . . (4.A.17)
⎪
⎪
⎪
⎪ 1 + E T (M+1 ) L̃ K J +1 , x = i K J +1 ,
⎪
⎪
⎪
⎪
⎪
⎪ 1, x = i n+1 ,
⎩
1 + E {T (M+1 )} , otherwise
Let us next consider the case in which some are the same among i K 0 +1 , i K 1 +1 ,
. . ., i K J +1 , and i n+1 . For example, assume a < b and i K a +1 = i K b +1 . Then, for
x=
i K a +1 = i K b +1 in (4.A.15) and (4.A.17), the line ‘1 + E T (M+1 ) L̃ K a +1 , x =
i K a +1 ’ corresponding to the K a -th piece among the lines of (4.A.17) will disappear
because the longest overlapping piece in the last part of M+1 is not L̃ K a +1 but L̃ K b +1 ,
Based on this fact, if we follow steps similar to those leading to (4.A.19) and (4.A.20),
we get
pin+1 E {T (M+1 )} = E{T (M)} + 1 − pi K j +1 E T L̃ K j +1 , (4.A.21)
j
where denotes the sum from j = 0 to J letting all E T L̃ K a +1 to 0 when
j
J
i K a +1 = i K b +1 for 0 ≤ a < b ≤ J + 1. Note here that K j j=1 are the lengths of the
overlapping pieces of M, not of M+1 . Note also that (4.A.20) is a special case of
(4.A.21): in other words, (4.A.21) is always applicable.
In essence, starting from E {T (i 1 )} = p1i shown in (4.A.12), we can successively
1
obtain E {T (i 1 , i 2 )}, E {T (i 1 , i 2 , i 3 )}, . . ., E{T (M)} based on (4.A.21).
Example 4.A.4 For an i.i.d. random variables {X k }∞ k=1 with the marginal pmf
p X k ( j) = p j , obtain the mean time to M = (5, 4, 5, 3).
"
i.e., E{T (5, 4)} = 1
E{T (5)} + 1 − p5 E{T (5)} = p41p5 . Next, when (5, 4, 5) is
p4
M+1 , because J = 0 and i K 0 +1 = 5 = i n+1 , we get pi K j +1 E T L̃ K j +1 = 0.
" j
Thus, E{T (5, 4, 5)} = p15 E{T (5, 4)} + 1 = p 1p2 + p15 . Finally, when (5, 4, 5, 3)
4 5
is M+1 , because J = 1 and K 1 = 1, we have i K 0 +1 = i 1 = 5, i K 1 +1 = i 2 = 4,
i K J +1 +1 = i 4 = 3, and
pi K j +1 E T L̃ K j +1 = p5 E{T (5)} + p4 E{T (5, 4)}, (4.A.23)
j
"
i.e., E{T (5, 4, 5, 3)} = 1
p3
E{T (5, 4, 5)} + 1 − p5 E{T (5)} − p4 E{T (5, 4)} =
1
p3 p4 p52
. ♦
Appendices 321
for every k ∈ {1, 2, . . . , n − 1}. Based on this observation, let us show that
{T = j + n} T > j, X j+1 , X j+2 , . . . , X j+n = M . (4.A.25)
First, when T = j + n, the first occurrence of M is X j+1 , X j+2 , . . . , X j+n , which
implies that T > j and
X j+1 , X j+2 , . . . , X j+n = M. (4.A.26)
Next, let us show that T = j + n when T > j and (4.A.26) holds true.
If k ∈ {1, 2, . . . , n − 1} and T = j + k, then we have X j+k = i n , X j+k−1 =
i n−1 , . . . , X j+1 = i n−k+1 . This is a contradiction to X j+1 , X j+2 , . . . , X j+n =
(i 1 , i 2 , . . . , i k ) = (i n−k+1 , i n−k+2 , . . . , i n ) implied by (4.A.24) and (4.A.26). In short,
for any value k in {1, 2, . . . , n − 1}, we have T = j + k and thus T ≥ j + n. Mean-
while, (4.A.26) implies T ≤ j + n. Thus, we get T = j + n.
From (4.A.25), we have
P(T = j + n) = P T > j, X j+1 , X j+2 , . . . , X j+n = M . (4.A.27)
∞
where p̂ = pi1 pi2 · · · pin . Now, recollecting that P(T = j + n) = 1 and that
j=0
∞ "
P(T > j) = P(T > 0) + P(T > 1) + · · · = P(T = 1) + P(T = 2) + · · · +
j=0
322 4 Random Vectors
"
∞
P(T = 2) + P(T = 3) + · · · + · · · = j P(T = j), i.e.,
j=0
∞
P(T > j) = E{T }, (4.A.29)
j=0
∞
we get p̂ P(T > j) = p̂E{T } = 1 from (4.A.28). Thus, we have E{T (M)} = 1p̂ ,
j=0
i.e.,
1
E{T (M)} = . (4.A.30)
pi1 pi2 · · · pin
Example 4.A.5 For the pattern M = (9, 5, 7), we have E{T (9, 5, 7)} = p5 p17 p9 .
Thus, to observe the pattern (9, 5, 7), we have to wait on the average until the p5 p17 p9 -
th repetition. In tossing a fair die, we need to repeat E{T (3, 5)} = p31p5 = 36 times
on the average to observe the pattern (3, 5) for the first time. ♦
Mx = (i 1 , i 2 , . . . , i n , x) (4.A.31)
length n + 1 by appropriately
of
19
choosing x as x ∈/ {i 1 , i 2 , . . . , i n } or x ∈
/
i K 0 +1 , i K 1 +1 , . . . , i K J +1 . Then, from (4.A.30), we have
1
E {T (Mx )} = . (4.A.32)
px p̂
1
E{T (M)} = − 1 + pi K j +1 E T L̃ K j +1 (4.A.33)
p̂ j
by noting that Mx in (4.A.31) and M+1 in (4.A.21) are the same. Now, if
we consider the case in which M is not an overlapping pattern, the last
term of (4.A.33) becomes pi K j +1 E T L̃ K j +1 = pi K0 +1 E T L̃ K 0 +1 =
j
pi1 E {T (i 1 )} = 1. Consequently, (4.A.33) and (4.A.30) are the same. Thus, for any
overlapping or non-overlapping pattern M, we can use (4.A.33) to obtain E{T (M)}.
1
E{T (9, 5, 9, 1, 9, 5, 9)} = 2 4
− 1 + p9 E{T (9)} + p5 E{T (9, 5)}
p1 p5 p9
"
+ p1 E{T (9, 5, 9, 1)}
1 1 1
= 2 4
+ 2
+ (4.A.34)
p1 p5 p9 p5 p9 p9
Comparing Examples 4.A.4 and 4.A.6, it is easy to see that we can obtain E{T (M)}
faster from (4.A.30) and (4.A.33) than from (4.A.21).
where K 1 < K 2 < · · · < K J are the lengths of the overlapping pieces with K J +1 =
n.
Proof For convenience, let α j = pi K j +1 E T L̃ K j +1 and β j = E T L K j . Also
let
⎧
⎨ 1, if i K j +1 = i K m +1 for every value of
j = m ∈ { j + 1, j + 2, . . . , J }, (4.A.36)
⎩
0, otherwise
1 J
E{T (M)} = − 1 + αjj. (4.A.37)
pi1 pi2 · · · pin j=0
324 4 Random Vectors
j−1
Now, α0 = pi1 E {T (i 1 )} = 1 and α j = β j + 1 − αl l for j = 1, 2, . . . , J
l=0
J
from (4.A.21). Solving for α j j=1 , we get α1 = β1 + 1 − 0 , α2 = β2 + 1 −
(1 α1 + 0 α0 ) = β2 − 1 β1 + (1 − 0 ) (1 − 1 ), α3 = β3 + 1 − (2 α2 + 1 α1 +
0 α0 ) = β3 − 2 β2 − 1 (1 − 2 ) β1 + (1 − 0 ) (1 − 1 ) (1 − 2 ), . . ., and
α J = β J − J −1 β J −1 − J −2 (1 − J −1 ) β J −2 − · · ·
− 1 (1 − 2 ) (1 − 3 ) · · · (1 − J −1 ) β1
+ (1 − 0 ) (1 − 1 ) · · · (1 − J −1 ) . (4.A.38)
Therefore,
J
α j j = β J + ( J −1 − J −1 ) β J −1 + { J −2 − J −2 J −1
j=0
− J −2 (1 − J −1 )} β J −2 + · · · + {1 − 1 2 − 1 (1 − 2 ) 3
− · · · − 1 (1 − 2 ) (1 − 3 ) · · · (1 − J −1 )} β1 + {0 + (1 − 0 ) 1
+ (1 − 0 ) (1 − 1 ) 2 + · · ·
+ (1 − 0 ) (1 − 1 ) · · · (1 − J −1 )} . (4.A.39)
In the right-hand side of (4.A.39), the second, third, . . ., second last terms are all 0,
and the last term is
0 + (1 − 0 ) 1 + (1 − 0 ) (1 − 1 ) 2 + · · ·
+ (1 − 0 ) (1 − 1 ) · · · (1 − J −3 ) J −2
+ (1 − 0 ) (1 − 1 ) · · · (1 − J −2 ) J −1 + (1 − 0 ) (1 − 1 ) · · · (1 − J −1 )
= 0 + (1 − 0 ) 1 + (1 − 0 ) (1 − 1 ) 2 + · · ·
+ (1 − 0 ) (1 − 1 ) · · · (1 − J −3 ) J −2 + (1 − 0 ) (1 − 1 ) · · · (1 − J −2 )
..
.
= 0 + (1 − 0 ) 1 + (1 − 0 ) (1 − 1 )
= 1. (4.A.40)
Thus, noting (4.A.40) and using (4.A.39) into (4.A.37), we get E{T (M)} =
1
pi pi ··· pi
− 1 + β J + 1, i.e.,
1 2 n
1
E{T (M)} = + E T L KJ . (4.A.41)
pi1 pi2 · · · pin
Next, if we obtain E T L K J after some steps similar to those for (4.A.41) by
recollecting that the overlapping pieces of L K J are L K 1 , L K 2 , . . . , L K J −1 , we have
Appendices 325
E T L KJ = 1
pi1 pi2 ··· pi K
+ E T L K J −1 . Repeating this procedure, and noting
J
that L 1 is not an overlapping piece, we get (4.A.35) by using (4.A.30). ♠
1+ p42 p5
Example 4.A.7 Using (4.A.35), it is easy to get E{T (5, 4, 4, 5)} = p42 p52
,
E{T (5, 4, 5, 4)} = 1+ p4 p5
p42 p52
, E{T (5, 4, 5, 4, 5)} = 1
p42 p53
+ 1
p4 p52
+ 1
p5
, and E{T (5, 4,
4, 5, 4, 4, 5)} = 1
p44 p53
+ p2 p2 + p15 .
1
♦
4 5
Example 4.A.8 Assume a coin with P(h) = p = 1 − P(t), where h and t denote
head and tail, respectively. Then, the expected numbers of tosses until the first
occurrences of h, tht, htht, ht, hh, and hthhthh are E{T (h)} = 1p , E{T (tht)} =
1
pq 2
+ q1 , E{T (htht)} = p21q 2 + pq
1
, E{T (hthh)} = p13 q + 1p , and E{T (hthhthh)} =
1
p5 q 2
+ p13 q + 1p , respectively, where q = 1 − p. ♦
Exercises
μe−μx (μx)n−1
f (x) = u(x) (4.E.1)
(n − 1)!
is the pdf of the sum of n i.i.d. exponential random variables with rate μ.
Exercise 4.2 A box contains three red and two green balls. We choose a ball from
the box, discard it, and choose another ball from the box. Let X = 1 and X = 2 when
the first ball is red and green, respectively, and Y = 4 and Y = 3 when the second
ball is red and green, respectively. Obtain the pmf p X of X , pmf pY of Y , joint pmf
p X,Y of X and Y , conditional pmf pY |X of Y given X , conditional pmf p X |Y of X
given Y , and pmf p X +Y of X + Y .
Exercise 4.3 For two i.i.d. random variables X 1 and X 2 with marginal distribution
P(1) = P(−1) = 0.5, let X 3 = X 1 X 2 . Are X 1 , X 2 , and X 3 pairwise independent?
Are they independent?
Exercise
When the joint pdf of a random vector (X, Y ) is f X,Y (x, y) =
4.4
a 1 + x y x 2 − y 2 u(1 − |x|)u (1 − |y|), determine the constant a. Are X and
Y independent of each other? If not, obtain the correlation coefficient between X
and Y .
Exercise 4.5 A box contains three red, six green, and five blue balls. A ball is chosen
randomly from the box and then replaced to the box after the color is recorded. After
six trials, let the numbers of red and blue be R and B, respectively. Obtain the
conditional pmf p R|B=3 of R when B = 3 and conditional mean E {R|B = 1} of R
when B = 1.
326 4 Random Vectors
Exercise 4.9 When the joint pdf of X 1 and X 2 is f X 1 ,X 2 (x, y) = 41 u(1 − |x|)u(1 −
%
|y|), obtain the cdf FW and pdf f W of W = X 12 + X 22 .
Exercise 4.10 Two random variables X and Y are independent of each other with the
pdf’s f X (x) = λe−λx u(x) and f Y (y) = μe−μy u(y), where λ > 0 and μ > 0. When
W = min(X, Y ) and
1, if X ≤ Y,
V = (4.E.3)
0, if X > Y,
Exercise 4.11 Obtain the pdf of U = X + Y + Z when the joint pdf of X , Y , and
Z is f X,Y,Z (x, y, z) = 6u(x)u(y)u(z)
(1+x+y+z)4
.
Exercise 4.13 In each of the two cases of the joint pdf f X described
in Exercise
4.12, obtain the joint pdf f Y of Y = (Y1 , Y2 ) = 21 X 12 + X 2 , 21 X 12 − X 2 , and
then, obtain the pdf f Y1 of Y1 and pdf f Y2 of Y2 based on f Y .
Exercise 4.14 Two random variables X ∼ G (α1 , β) and Y ∼ G (α2 , β) are inde-
pendent of each other. Show that Z = X + Y and W = YX are independent of each
other and obtain the pdf of Z and pdf of W .
⎩
1, w ≥ 1
0.
0, w < 1,
and f W (w) = − r1 w r −1 u(w − 1) if r < 0.
1
3. FW (w) = 1
1−w , w ≥1
r
Exercise 4.24 The joint pdf of (X, Y ) is f X,Y (x, y) = 41 u (1 − |x|) u (1 − |y|).
When A = X 2 + Y 2 ≤ a 2 with 0 < a < 1, obtain the conditional joint cdf FX,Y |A
and conditional joint pdf f X,Y |A .
where ρ is the correlation coefficient between the random variables X and Y both
with zero mean and unit variance.
4.29 Obtain the pdf of Y when the joint pdf of (X, Y ) is f X,Y (x, y) =
Exercise
1
y
exp −y − xy u(x)u(y).
Exercise 4.31 For two i.i.d random variables X 1 and X 2 with marginal pmf p(x) =
e−λ λx! ũ(x), where λ > 0, obtain the pmf of M = max (X 1 , X 2 ) and pmf of N =
x
min (X 1 , X 2 ).
Exercise 4.32 For two i.i.d. random variables X and Y with marginal pdf f (z) =
u(z) − u(z − 1), obtain the pdf’s of W = 2X , U = −Y , and Z = W + U .
Exercises 329
Exercise 4.33
For"three i.i.d. random variables X 1 , X 2 , and X 3 with
marginal dis-
tribution U − 21 , 21 , obtain the pdf of Y = X 1 + X 2 + X 3 and E Y 4 .
· · · + X k for k = 1, 2, . . . , n.
Exercise 4.36 For independent random variables X 1 and X 2 with pdf’s f X 1 (x) =
u(x)u(1 − x) and f X 2 (x) = e−x u(x), obtain the pdf of Y = X 1 + X 2 .
Exercise 4.37 Three Poisson random variables X 1 , X 2 , and X 3 with means 2, 1, and
4, respectively, are independent of each other.
(1) Obtain the mgf of Y = X 1 + X 2 + X 3 .
(2) Obtain the distribution of Y .
Exercise 4.38 When the joint pdf of X , Y , and Z is f X,Y,Z (x, y, z) = k(x + y +
z)u(x)u(y)u(z)u(1 − x)u(1 − y)u(1 − z), determine the constant k and obtain the
conditional pdf f Z |X,Y (z|x, y).
Here,
−λis a realization
of a random variable with pdf f (v) = e−v u(v). Obtain
E e X =1 .
Exercise 4.40 When U1 , U2 , and U3 are independent of each other, obtain the joint
pdf f X,Y,Z (x, y, z) of X = U1 , Y = U1 + U2 , and Z = U1 + U2 + U3 in terms of
the pdf’s of U1 , U2 , and U3 .
Exercise 4.42 Consider a random vector (X, Y ) with joint pdf f X,Y (x, y) =
c u (r − |x| − |y|), where c is a constant and r > 0.
(1) Express c in terms of r and obtain the pdf f X (x).
(2) Are X and Y independent of each other?
(3) Obtain the pdf of Z = |X | + |Y |.
Exercise 4.43 Assume X with cdf FX and Y with cdf FY are independent of each
other. Show that P(X ≥ Y ) ≥ 21 when FX (x) ≤ FY (x) at every point x.
Exercise 4.44 The joint pdf of (X, Y ) is f X,Y (x, y) = c x 2 + y 2 u(x)u(y)u (1
−x 2 − y 2 .
(1) Determine the constant c and obtain the pdf of X and pdf of Y . Are X and Y
independent of each other? √
(2) Obtain the joint pdf f R,Θ of R = X 2 + Y 2 and Θ = tan−1 YX .
(3) Obtain the pmf of the output Q = q(R, Θ) of polar quantizer, where
1 π(k−1) πk
k, if 0 ≤ r ≤ 21 4 , ≤θ ≤ ,
q(r, θ ) = 1 8 8 (4.E.8)
π(k−1) πk
k + 4, if 21 4 ≤ r ≤ 1, 8
≤θ ≤ 8
for k = 1, 2, 3, 4.
Exercise 4.45 Two types of batteries have the pdf f (x) = 3λx 2 exp(−λx 3 )u(x) and
g(y) = 3μy 2 exp(−μy 3 )u(y), respectively, of lifetime with μ > 0 and λ > 0. When
the lifetimes of batteries are independent of each other, obtain the probability that
the battery with pdf f of lifetime lasts longer than that with g, and obtain the value
when λ = μ.
Exercise 4.46 Two i.i.d. random variables X and Y have marginal pdf f (x) =
e−x u(x).
(1) Obtain the pdf each of U = X + Y , V = X − Y , X Y , YX , Z = X
X +Y
, min(X, Y ),
min(X,Y )
max(X, Y ), and max(X,Y )
.
(2) Obtain the conditional pdf of V when U = u.
(3) Show that U and Z are independent of each other.
Exercise 4.47 Two Poisson random variables X 1 ∼ P (λ1 ) and X 2 ∼ P (λ2 ) are
independent of each other.
(1) Show that X 1 + X 2 ∼ P (λ1 + λ2 ).
(2) Show that the conditional distribution of X 1 when X 1 + X 2 = n is b n, λ1λ+λ
1
2
.
of X and pmf
1
, y = 0,
pY (y) = 2 (4.E.10)
1
2
, y=1
Exercise 4.57 Two exponential random variables T1 and T2 with rate λ1 and λ2 ,
respectively, are independent of each other. Let U = min (T1 , T2 ), V = max (T1 , T2 ),
and I be the smaller index, i.e., the index I such that TI = U .
(1) Obtain the expected values E{U }, E{V − U }, and E{V }.
(2) Obtain E{V } using V = T1 + T2 − U .
(3) Obtain the joint pdf fU,V −U,I of (U, V − U, I ).
(4) Are U and V − U independent of each other?
Exercise 4.58 Consider a bi-variate beta random vector (X, Y ) with joint pdf
Γ ( p1 + p2 + p3 ) p1 −1 p2 −1
f X,Y (x, y) = x y (1 − x − y) p3 −1
Γ ( p1 ) Γ ( p2 ) Γ ( p3 )
×u(x)u(y)u(1 − x − y), (4.E.11)
Exercise 4.60 Let the joint pdf of X and Y be f X,Y (x, y) = |x y|u(1 − |x|)u(1 −
|y|). When A = X 2 + Y 2 ≤ a 2 with 0 < a < 1, obtain the conditional joint cdf
FX,Y |A and conditional joint pdf f X,Y |A .
Exercise 4.62 When the cf of (X, Y ) is ϕ X,Y (t, s), show that the cf of Z = a X + bY
is ϕ X,Y (at, bt).
n!
f X,Y (x, y) = F i−1 (x){F(y) − F(x)}k−i−1
(i − 1)!(k − i − 1)!(n − k)!
×{1 − F(y)}n−k f (x) f (y)u(y − x), (4.E.13)
where i, k, and n are natural numbers such that 1 ≤ i < k ≤ n, F is the cdf of a
random variable, and f (t) = dtd F(t). Obtain the pdf of X and pdf of Y .
Exercises 333
E {X 1 } 1 − p2 E {X 2 } 1 − p1
= , = . (4.E.14)
E {X 3 } p2 E {X 3 } p1
where m ±X , f , m X , and σ X2 are the half means defined in (3.E.28), pdf, mean, and
variance, respectively, of X . Obtain the value of ρ X |X | and compare it with what can
be obtained intuitively in each of the following cases of the pdf f X (x) of X :
(1) f X (x) is an even function.
(2) f X (x) > 0 only for x ≥ 0.
(3) f X (x) > 0 only for x ≤ 0.
Exercise 4.66 For a random variable X with pdf f X (x) = u(x) − u(x − 1), obtain
the joint pdf of X and Y = 2X + 1.
Exercise 4.67 Consider a random variable X and its magnitude Y = |X |. Show that
the conditional pdf f X |Y can be expressed as
f X (x)δ(x + y) f X (x)δ(x − y)
f X |Y (x|y) = u(−x) + u(x) (4.E.16)
f X (x) + f X (−x) f X (x) + f X (−x)
for y ∈ {y | { f X (y) + f X (−y)} u(y) > 0}, where f X is the pdf of X . Obtain the
conditional pdf f Y |X (y|x). (Hint. Use (4.5.15).)
334 4 Random Vectors
Exercise 4.68 Show that the joint cdf and joint pdf are
⎧
⎪
⎪ FX (x)u y−a −x
⎨ c
+FX y−a u x − y−a , c > 0,
FX,cX +a (x, y) = c c (4.E.17)
⎪
⎪ F (x)u(y − a), c = 0,
⎩ X
FX (x) − FX y−a c
u x− y−a
c
, c<0
and
1
|c|
f X (x)δ y−a −x , c = 0,
f X,cX +a (x, y) = c (4.E.18)
f X (x)δ(y − a), c = 0,
Exercise 4.69 Let f and F be the pdf and cdf, respectively, of a continuous random
variable X , and let Y = X 2 .
(1) Obtain the joint cdf FX,Y . ∞
(2) Obtain the joint pdf f X,Y , and then confirm −∞ f X,Y (x, y)dy = f (x) and
∞
1 √ √
f X,Y (x, y)d x = √ f y + f − y u(y) (4.E.19)
−∞ 2 y
by integration.
(3) Obtain the conditional pdf f X |Y .
Exercise
∞ ∞ 4.70 Show that the pdf f X,cX shown in (4.5.9) satisfies
−∞ −∞ f X,cX (x, y)d yd x = 1.
Exercise 4.71 Express the joint cdf and joint pdf of the input X and output Y =
X u(X ) of a half-wave rectifier in terms of the pdf f X and cdf FX of X .
Exercise 4.72 Obtain (4.5.2) from FX,X +a (x, y) = FX (min(x, y − a)) shown in
(4.5.1).
shown in (4.2.20).
References
M. Abramowitz, I.A. Stegun (eds.), Handbook of Mathematical Functions (Dover, New York, 1972)
J. Bae, H. Kwon, S.R. Park, J. Lee, I. Song, Explicit correlation coefficients among random variables,
ranks, and magnitude ranks. IEEE Trans. Inform. Theory 52(5), 2233–2240 (2006)
N. Balakrishnan, Handbook of the Logistic Distribution (Marcel Dekker, New York, 1992)
D.L. Burdick, A note on symmetric random variables. Ann. Math. Stat. 43(6), 2039–2040 (1972)
W.B. Davenport Jr., Probability and Random Processes (McGraw-Hill, New York, 1970)
H.A. David, H.N. Nagaraja, Order Statistics, 3rd edn. (Wiley, New York, 2003)
A.P. Dawid, Some misleading arguments involving conditional independence. J. R. Stat. Soc. Ser.
B (Methodological) 41(2), 249–252 (1979)
W.A. Gardner, Introduction to Random Processes with Applications to Signals and Systems, 2nd
edn. (McGraw-Hill, New York, 1990)
S. Geisser, N. Mantel, Pairwise independence of jointly dependent variables. Ann. Math. Stat. 33(1),
290–291 (1962)
R.M. Gray, L.D. Davisson, An Introduction to Statistical Signal Processing (Cambridge University
Press, Cambridge, 2010)
R.A. Horn, C.R. Johnson, Matrix Analysis (Cambridge University Press, Cambridge, 1985)
N.L. Johnson, S. Kotz, Distributions in Statistics: Continuous Multivariate Distributions (Wiley,
New York, 1972)
S.A. Kassam, Signal Detection in Non-Gaussian Noise (Springer, New York, 1988)
S.M. Kendall, A. Stuart, Advanced Theory of Statistics, vol. II (Oxford University, New York, 1979)
A. Leon-Garcia, Probability, Statistics, and Random Processes for Electrical Engineering, 3rd edn.
(Prentice Hall, New York, 2008)
K.V. Mardia, Families of Bivariate Distributions (Charles Griffin and Company, London, 1970)
R.N. McDonough, A.D. Whalen, Detection of Signals in Noise, 2nd edn. (Academic, New York,
1995)
P.T. Nielsen, On the expected duration of a search for a fixed pattern in random data. IEEE Trans.
Inform. Theory 19(5), 702–704 (1973)
A. Papoulis, S.U. Pillai, Probability, Random Variables, and Stochastic Processes, 4th edn.
(McGraw-Hill, New York, 2002)
V.K. Rohatgi, A.KMd.E. Saleh, An Introduction to Probability and Statistics, 2nd edn. (Wiley, New
York, 2001)
J.P. Romano, A.F. Siegel, Counterexamples in Probability and Statistics (Chapman and Hall, New
York, 1986)
S.M. Ross, A First Course in Probability (Macmillan, New York, 1976)
336 4 Random Vectors
S.M. Ross, Stochastic Processes, 2nd edn. (Wiley, New York, 1996)
S.M. Ross, Introduction to Probability Models, 10th edn. (Academic, Boston, 2009)
G. Samorodnitsky, M.S. Taqqu, Non-Gaussian Random Processes: Stochastic Models with Infinite
Variance (Chapman and Hall, New York, 1994)
I. Song, J. Bae, S.Y. Kim, Advanced Theory of Signal Detection (Springer, Berlin, 2002)
J.M. Stoyanov, Counterexamples in Probability, 3rd edn. (Dover, New York, 2013)
A. Stuart and J. K. Ord, Advanced Theory of Statistics: Vol. 1. Distribution Theory, 5th edn. (Oxford
University, New York, 1987)
J.B. Thomas, Introduction to Probability (Springer, New York, 1986)
Y.H. Wang, Dependent random variables with independent subsets. Am. Math. Mon. 86(4), 290–292
(1979).
G.L. Wies, E.B. Hall, Counterexamples in Probability and Real Analysis (Oxford University, New
York, 1993)
Chapter 5
Normal Random Vectors
In this chapter, we consider normal random vectors in the real space. We first describe
the pdf and cf of normal random vectors, and then consider the special cases of bi-
variate and tri-variate normal random vectors. Some key properties of normal random
vectors are then discussed. The expected values of non-linear functions of normal
random vectors are also investigated, during which an explicit closed form for joint
moments is presented. Additional topics related to normal random vectors are then
briefly described.
Let us first describe the pdf and cf of normal random vectors (Davenport 1970; Kotz
et al. 2000; Middleton 1960; Patel et al. 1976) in general. We then consider additional
topics in the special cases of bi-variate and tri-variate normal random vectors.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 337
I. Song et al., Probability and Random Variables: Theory and Applications,
https://doi.org/10.1007/978-3-030-97679-8_5
338 5 Normal Random Vectors
When m = (0, 0, . . . , 0)T and all the diagonal elements of K are 1, the normal
distribution is called a standard normal distribution. We will in most cases assume
|K | > 0: when |K | = 0, the distribution is called a degenerate distribution and will
be discussed briefly in Theorems 5.1.3, 5.1.4, and 5.1.5.
The distribution of a normal random vector is often called a jointly normal distri-
bution and a normal random vector is also called jointly normal random variables.
It should be noted that ‘jointly normal random variables’ and ‘normal random vari-
ables’ are strictly different. Specifically, the term ‘jointly normal random variables’
is a synonym for ‘a normal random vector’. However, the term ‘normal random vari-
ables’ denotes several random variables with marginal normal distributions which
may or may not be a normal random vector. In fact, all the components of a non-
Gaussian random vector may be normal random variables in some cases as we shall
see in Example 5.2.3 later, for instance.
Example 5.1.1 For a random vector X ∼ N (m, K ), the mean vector, covariance
matrix, and correlation matrix are m, K , and R = K + m m T , respectively. ♦
where ω = (ω1 , ω2 , . . . , ωn )T .
−2 1
α= {(2π)
|K |} and y = (y1 , y2 , . . . , yn )T = x − m, the cf
n
Proof Letting
ϕ X (ω) = E exp jω X of X can be calculated as
T
1
ϕ X (ω) = α exp − (x − m)T K −1 (x − m) exp jω T x d x
2
x∈Rn
1
= α exp jω T m exp − y T K −1 y − 2 jω T y d y. (5.1.3)
2
y∈Rn
T
Now, recollecting that ω T y = ω T y = y T ω because ω T y is scalar and
T
that K = K T , we have y − j K T ω K −1 y − j K T ω = y T K −1 − jω T K
K −1 y − j K T ω = y T K −1 y − j y T K −1 K T ω − jω T y − ω T K T ω, i.e.,
T
y − j KTω K −1 y − j K T ω = y T K −1 y − 2 jω T y − ω T K ω. (5.1.4)
5.1 Probability Functions 339
ϕ X (ω) = α exp jω T m
1 T
× exp − y − j K T ω K −1 y − j K T ω + ω T K ω d y
2
y∈Rn
1 1
= α exp jω T m exp − ω T K ω exp − z T K −1 z d z
2 2
z∈Rn
1
= exp j m T ω − ω T K ω (5.1.5)
2
from (5.1.3). ♠
σ 2 σ2 σ 2 =
1 2
−2 1 − ρ ln(1 − α). As shown in Exercise 5.44, the major axis of the ellipse makes
2
the angle
1 2ρσ1 σ2
θ = tan−1 2 (5.1.9)
2 σ1 − σ22
with the positive x-axis. For ρ > 0, we have 0 < θ < π4 , θ = π4 , and π4 < θ < π2 when
σ1 > σ2 , σ1 = σ2 , and σ1 < σ2 , respectively, as shown in Figs. 5.1, 5.2, and 5.3.
Denoting the standard bi-variate normal pdf by f 2 , we have
y2
f 2 (0, y) = f 2 (0, 0) exp − , (5.1.10)
2 1 − ρ2
−1
where f 2 (0, 0) = 2π 1 − ρ2 .
m2
θ
m1 x
m2
m1 x
5.1 Probability Functions 341
m2
m1 x
Example 5.1.2 By integrating the joint pdf (5.1.8)over y and x, it is easy to see
that we have X ∼ N m 1 , σ12 and Y ∼ N m 2 , σ22 , respectively, when (X, Y ) ∼
N m 1 , m 2 , σ12 , σ22 , ρ . In other words, two jointly normal random variables are also
individually normal, which is a special case of Theorem 5.2.1. ♦
Example 5.1.3 For (X, Y ) ∼ N m 1 , m 2 , σ12 , σ22 , ρ , X and Y are independent if
ρ = 0. That is, two uncorrelated jointly normal random variables are independent,
which will be generalized in Theorem 5.2.3. ♦
Example 5.1.4 From Theorem 5.1.1, the joint cf is ϕ X,Y (u, v) = exp j m 1 u +
m 2 v − 21 σ12 u 2 + 2ρσ1 σ2 uv + σ22 v 2 for (X, Y ) ∼ N m 1 , m 2 , σ12 , σ22 , ρ . ♦
2 ∞
x − 2y − 41 y 2 d x = 2√1 π exp − 14 y 2 −∞ √12π exp − 21 v 2 dv of Y = X 1 + X 2
can eventually be obtained as
2
1 y
f Y (y) = √ exp − (5.1.11)
2 π 4
√
using (4.2.37) and letting v = 2 x − 2y . In other words, The sum of two indepen-
dent, standard normal
random
variables
is an N (0, 2) random variable. In general,
when X 1 ∼ N m 1 , σ12 , X 2 ∼ N m 2 , σ22 , and X 1 and X 2 are independent of each
other, we have Y = X 1 + X 2 ∼ N m 1 + m 2 , σ12 + σ22 . A further generalization of
this result is expressed as Theorem 5.2.5 later. ♦
Based on the results in Examples 4.2.10 and 5.1.6, we can show the following
theorem:
Theorem 5.1.2 When (X, Y ) ∼ N 0, 0, σ 2X , σY2 , ρ , we have the pdf
σ X σY 1 − ρ2
f Z (z) = 2 2 (5.1.12)
π σY z − 2ρσ X σY z + σ 2X
and cdf
1 1 σY z − ρσ X
FZ (z) = + tan−1 (5.1.13)
2 π σ X 1 − ρ2
of Z = X
Y
.
−1 2
Proof Let α = 2πσ X σY 1 − ρ2 and β = 2 1−ρ
1 z
2 − σ σ +
2ρz 1
. Using
( 2
) σX X Y σY2
∞ ∞
(4.2.27), we get f Z (z) = −∞ |v| f X,Y (zv, v)dv = α −∞ |v| exp − 2 1−ρ 1
2 2 ( 2)
z v
∞
+ σv2 v2
2 2
z2
− 2ρzv dv = 2α 0 v exp − 2 1−ρ − σ2ρz + σ12 dv, i.e.,
σ 2X σ X σY Y ( 2 ) σ2X X σY Y
∞
f Z (z) = 2α v exp −βv 2 dv. (5.1.14)
0
∞ ∞
2
Thus, noting that 0 v exp −βv 2
dv = − 1
2β
exp −βv = 2β 1
, we get the pdf
√ 2 2
0 −1
σ σ
of Z as f Z (z) = αβ = X Y π σY2 z − ρσ
1−ρ
σY
X
+ σ 2X 1 − ρ2 , which is the
same as (5.1.12).
σY ρσ X
Next, if we let tan θz = √ z− σY
for convenience, the cdf
1−ρ2 σ X
z √
σ X σY 1−ρ2 z
of Z can be obtained as FZ (z) = −∞ f Z (t)dt = πσ2 1−ρ2 −∞ b(t)dt =
√ √ X( )
σ X σY 1−ρ2 σ X 1−ρ2 θz π
− π2 dθ = π θz + 2 , leading to (5.1.13), where b(t) =
1
πσ X (1−ρ2 )
2 σY
2 −1
σY2 ρσ X
1 + σ2 1−ρ t − . ♠
X( )
2 σY
exp − (y−m 2)
2
2σ22 x − m1 y − m2
lim f X 1 ,X 2 (x, y) = √ δ −ξ (5.1.15)
ρ→±1 2πσ1 σ2 σ1 σ2
σ2
+ σ22 σ22
, i.e.,
2
(x − m 1 )2 (x − m 1 ) (y − m 2 ) (y − m 2 )2
− 2ρ +
σ12 σ1 σ2 σ22
2
x − m1 y − m2 y − m2 2
= −ρ + 1 − ρ2 , (5.1.17)
σ1 σ2 σ2
−1
where α = 2 1 − ρ2 . Now, noting that απ exp −αx 2 → δ(x) for α → ∞
as shown in Example 1.4.6 and that α → ∞ for ρ → ±1, we can obtain (5.1.15)
from (5.1.16). ♠
2σ22 2σ12
or
exp −ξ (x−m2σ1 )(y−m
1 σ2
2)
and the term σ11σ2 δ x−m
σ1
1
− ξ y−m
σ2
2
can be replaced with
δ (σ2 (x − m 1 ) − ξσ1 (y − m 2 )).
For a standard tri-variate normal random vector (X 1 , X 2 , X 3 ), let us denote the covari-
ance matrix as
⎡ ⎤
1 ρ12 ρ31
K 3 = ⎣ρ12 1 ρ23 ⎦ (5.1.18)
ρ31 ρ23 1
and the pdf as f 3 (x, y, z) = √ 1
exp − 21 (x y z)K −1
3 (x y z) , i.e.,
T
8π 3 |K 3 |
344 5 Normal Random Vectors
1
1
f 3 (x, y, z) = exp − 1 − ρ223 x 2 + 1 − ρ231 y 2
8π 3 |K 3 | 2 |K 3 |
+ 1 − ρ212 z 2 + 2c12 x y + 2c23 yz + 2c31 zx , (5.1.19)
− 1
where ci j = ρ jk ρki − ρi j . Then, we have f 3 (0, 0, 0) = 8π 3 |K 3 | 2 ,
|K 3 | = 1 − ρ212 + ρ223 + ρ231 + 2ρ12 ρ23 ρ31
= 1 − ρ2jk 1 − ρ2ki − ci2j
= αi2j,k 1 − βi2j,k , (5.1.20)
and
⎡ ⎤
1 − ρ223 c12 c31
1 ⎣
K −1 = c12 1 − ρ231 c23 ⎦ , (5.1.21)
3
|K 3 | c31 c23 1 − ρ212
where
αi j,k = 1 − ρ2jk 1 − ρ2ki (5.1.22)
−ci j
and βi j,k = , i.e.,
1−ρ2jk (1−ρ2ki )
ρi j − ρ jk ρki
βi j,k = (5.1.23)
αi j,k
|K 3 | → − ρ jk − sgn ρi j ρki → 0. ♦
1
Note also that f 3 (0, y, z) = f 3 (0, 0, 0) exp − 2 (0 y z)K −1
3 (0 y z)
T
= f 3 (0,
0, 0) exp − 2|K 3 | 1 − ρ31 y + 2c23 yz + 1 − ρ12 z , i.e.,
1 2 2 2 2
1 y2 2β23,1 yz
f 3 (0, y, z) = f 3 (0, 0, 0) exp − −
2 1 − β23,1
2 1 − ρ212 α23,1
z2
+ (5.1.24)
1 − ρ231
5.1 Probability Functions 345
and
1 − ρ212 2
f 3 (0, 0, z) = f 3 (0, 0, 0) exp − z . (5.1.25)
2 |K 3 |
∞
Example 5.1.8 Based on (5.1.24) and (5.1.25), we have h 1 (z) f 3 (0, 0, z)dz =
−∞
2
∞ z2
∞ w
√ 1 −∞ h 1 (z) exp − 2 2 dz = √ 23,1
−∞ h1 23,1 w exp − 2 dw , i.e.,
8π 3 |K 3 | 23,1 8π 3 |K 3|
⎛ ⎞
∞ ∞
√
1 |K 3 |
h 1 (z) f 3 (0, 0, z)dz = h1 ⎝ w⎠
−∞ 8π 1 − ρ12
3 2 −∞ 1 − ρ12 2
2
w
× exp − dw
2
⎧ ⎛ ⎞⎫
⎨ √ ⎬
1 |K 3 |
= E h1 ⎝ U⎠ (5.1.26)
⎩ ⎭
2π 1 − ρ212 1 − ρ212
for a uni-variate function h 1 , where 2
i j,k = 1 − βi2j,k 1 − ρ2jk and U ∼ N (0, 1).
We also have
∞ ∞ ∞ ∞
1
h 2 (y, z) f 3 (0, y, z)dydz = h 2 (y, z)
−∞ −∞ 8π 3 |K 3 | −∞ −∞
1 y2 2β23,1 yz z2
× exp − − + dydz
2 1 − β23,1
2 1 − ρ212 α23,1 1 − ρ231
∞ ∞
α23,1
= h2 1 − ρ12 v, 1 − ρ31 w
2 2
8π 3 |K 3 | −∞ −∞
1 2
× exp − v − 2β23,1 vw + w 2
dvdw
2 1 − β23,1
2
2πα23,1 1 − β23,1 2 ∞ ∞
= h2 1 − ρ12 v, 1 − ρ31 w
2 2
8π 3 |K 3 | −∞ −∞
× f 2 (v, w)|ρ=β23,1 dvdw
1
= √ E h2 1 − ρ12 V1 , 1 − ρ31 V2
2 2
(5.1.27)
2π
346 5 Normal Random Vectors
for a bi-variate function h 2 , where (V1 , V2 ) ∼ N 0, 0, 1, 1, β23,1 . The two results
(5.1.26) and (5.1.27) are useful in obtaining the expected values of some non-linear
functions. ♦
Denote by gρ (x, y, z) the standard tri-variate normal pdf f 3 (x, y, z) with the
covariance matrix K 3 shown in (5.1.18) so that the correlation coefficients ρ =
(ρ12 , ρ23 , ρ31 ) are shown explicitly. Then, we have
gρ (−x, y, z) = gρ (x, y, z) 1 , (5.1.28)
gρ (x, −y, z) = gρ (x, y, z) 2 , (5.1.29)
gρ (x, y, −z) = gρ (x, y, z) 3 , (5.1.30)
and
0 ∞ ∞
h(x, y, z)gρ (x, y, z)d xd ydz
−∞ 0 0
0 ∞ ∞
= h(x, y, −t)gρ (x, y, −t)d xd y(−dt)
∞ 0 0
∞ ∞ ∞
= h(x, y, −z)gρ (x, y, z)d xd ydz (5.1.31)
0 0 0 3
for a tri-variate function h(x, y, z). Here, k denotes the replacements of the corre-
lation coefficients ρ jk and ρki with −ρ jk and −ρki , respectively.
and
∂ ρki − ρi j ρ jk
sin−1 βi j,k = − √ (5.1.35)
∂ρ jk 1 − ρ2jk |K 3 |
After steps similar to those used in obtaining (5.1.15), we can obtain the following
theorem:
5.1 Probability Functions 347
Theorem 5.1.4 Letting ξi j = sgn ρi j , we have
)
exp − 21 x 2 {z − μ1 (x, y)}2
f 3 (x, y, z) → exp − δ (x − ξ12 y) (5.1.36)
2π 1 − ρ231 2 1 − ρ231
when ρ12 → ±1, where μ1 (x, y) = 21 ξ12 (ρ23 x + ρ31 y). We subsequently have
exp − 21 x 2
f 3 (x, y, z) → √ δ (x − ξ12 y) δ (x − ξ31 z) (5.1.37)
2π
−m 1 )
exp − (x1 2σ
2
*n
2 1 x1 − m 1 xi − m i
f X (x) → 1
δ − ξ1i (5.1.38)
2πσ12 σ
i=2 i
σ1 σi
when ρ1 j → ±1 for j = 2, 3, . . . , n, where ξ1 j = sgn ρ1 j .
Note in Theorem 5.1.5 that, when ρ1 j → ±1 for j ∈ {2, 3, . . . , n}, the value of ρi j
for i ∈ {2, 3, . . . , n} and j ∈ {2, 3, . . . , n} is determined as ρi j → ρ1i ρ1 j = ξ1i ξ1 j .
In the tri-variate case, for instance, when ρ12 → 1 and ρ31 → 1, we have ρ23 → 1
from lim |K 3 | = − (1 − ρ23 )2 ≥ 0.
ρ12 ,ρ13 →1
348 5 Normal Random Vectors
5.2 Properties
In this section, we discuss the properties (Hamedani 1984; Horn and Johnson 1985;
Melnick and Tenenbein 1982; Mihram 1969; Pierce and Dykstra 1969) of normal
random vectors. Some of the properties we will discuss in this chapter are based on
those described in Chap. 4. We will also present properties unique to normal random
vectors.
and
−1 Ψ 11 Ψ 12
K = . (5.2.2)
Ψ 21 Ψ 22
Ψ 11 = K −1 −1
11 + K 11 K 12 ξ
−1
K 21 K −1
11 , (5.2.3)
Ψ 12 = −K −1 −1
11 K 12 ξ , (5.2.4)
−1 −1
Ψ 21 = −ξ K 21 K 11 , (5.2.5)
and
Ψ 22 = ξ −1 , (5.2.6)
where ξ = K 22 − K 21 K −1
11 K 12 .
of (X, Y ), where g(x, y) = 1
2π
exp − 21 x 2 + y 2 . Then, we have the marginal pdf
⎧ 2
⎨ 1 exp − x 2 0
−∞ exp − 2 dy, x < 0,
y
π 2
f X (x) = 1 ∞
⎩ exp − x 2 exp − y2
dy, x ≥ 0
π 2 0 2
2
1 x
= √ exp − (5.2.9)
2π 2
of X . We can similarly show that Y is also a normal random variable. In other words,
although X and Y are both normal random variables, (X, Y ) is not a normal random
vector. ♦
Example 5.2.3 (Stoyanov 2013) Let φ1 (x, y) and φ2 (x, y) be two standard bi-
variate normal pdf’s with correlation coefficients ρ1 and ρ2 , respectively. Assume
that the random vector (X, Y ) has the joint pdf
where c1 > 0, c2 > 0, and c1 + c2 = 1. Then, when ρ1 = ρ2 , f X,Y is not a normal pdf
and, therefore, (X, Y ) is not a normal random vector. Now, we have X ∼ N (0, 1),
Y ∼ N (0, 1), and the correlation coefficient between X and Y is ρ X Y = c1 ρ1 + c2 ρ2 .
If we choose c1 = ρ2ρ−ρ
2
1
and c2 = ρ1ρ−ρ
1
2
for ρ1 ρ2 < 0, then c1 > 0, c2 > 0, c1 + c2 =
1, and ρ X Y = 0. In short, although X and Y are both normal and uncorrelated with
each other, they are not independent of each other because (X, Y ) is not a normal
random vector. ♦
Let us first consider a generalization of the result obtained in Example 5.1.5 that the
sum of two jointly normal random variables is a normal random variable.
n
Theorem 5.2.5 When the random variables X i ∼ N m i , σi2 i=1 are independent
of each other, we have
/ 0
.
n .
n .
n
Xi ∼ N mi , σi2 . (5.2.14)
i=1 i=1 i=1
Proof Because the cf of X i is ϕ X i (ω) = exp jm i ω − 21 σi2 ω 2 , we can obtain the cf
1
n n
+ +n
ϕY (ω) = exp jm i ω − 21 σi2 ω 2 = exp jm i ω − 21 σi2 ω 2 of Y = X i as
i=1 i=1 i=1
/ n 0 / n 0
. 1 . 2
ϕY (ω) = exp j mi ω − σi ω 2 (5.2.15)
i=1
2 i=1
Generalizing Theorem 5.2.5 further, we have the following theorem that a linear
transformation of a normal random vector is also a normal random vector:
Theorem
5.2.6 When X = (X 1 , X 2 , . . . , X n )T ∼ N (m, K ), we have L X ∼
N Lm, L K L T when L is an n × n matrix such that |L| = 0.
Proof First, we have X = L −1 Y because |L| = 0, and the Jacobian of the inverse
transformation x = g ( y) = L y is ∂∂y g −1 ( y) = L −1 = |L|
−1 −1 1
. Thus, we have
the pdf f Y ( y) = |L| f X (x) −1 of Y as
1
x=L y
352 5 Normal Random Vectors
T
exp − 21 L −1 y − m K −1 L −1 y − m
f Y ( y) = √ (5.2.16)
|L| (2π)n |K |
−1 −1 T T
from Theorem 4.2.1. Now, note that L T = L , L = |L|, and
−1 T −1 −1 −1 T −1 −1
L y−m K L y − m = ( y − Lm) L T
K L ( y − Lm). In
−1
T −1 −1 −1 −1 T
addition, letting H = L KL , we have H = L
T
K L = L
K −1 L −1 and |H| = L K L T = |L|2 |K |. Then, we can rewrite (5.2.16) as
1 1 −1
f Y ( y) = √ exp − ( y − Lm) H ( y − Lm) ,
T
(5.2.17)
(2π)n |H| 2
which implies L X ∼ N Lm, L K L T when X ∼ N (m, K ). ♠
Theorem 5.2.6 is a combined generalization of the facts that the sum of two jointly
normal random variables is a normal random variable, as described in Example 5.1.5,
and that the sum of a number of independent normal random variables is a normal
random variable, as shown in Theorem 5.2.5.
Example 5.2.5 For (X, Y ) ∼ N (10, 0, 4, 1, 0.5), find the numbers a and b so that
Z = a X + bY and W = X + Y are uncorrelated.
Solution Clearly, E{Z W } − E{Z }E{W } = E a X 2 + bY 2 + (a + b)X Y − 100a
= 5a + 2b because E{Z } = 10a and E{W } = 10. Thus, for any pair of two real
numbers a and b such that 5a + 2b = 0, the two random variables Z = a X + bY
and W = X + Y will be uncorrelated. ♦
Example 5.2.6 For a random vector (X, Y ) ∼ N (10, 0, 4, 1, 0.5), obtain the joint
distribution of Z = X + Y and W = X − Y .
Solution We first note that Z and W are jointly normal from Theorem 5.2.6. We
thus only need to obtain E{Z }, E{W }, Var{Z }, Var{W }, and ρ Z W . We first have
E{Z } = 10 and E{W
} = 10 from E{X 2± Y } = E{X } ± E{Y 3 }. Next, we have the
variance σ 2Z = E (X + Y − 10)2 = E {(X − 10) + Y }2 of Z as
√
3 1 2
f Z ,W (x, y) = exp − 3x − 6x y + 7y − 80y + 400 . (5.2.19)
2
12π 24
+
n
From Theorem 5.2.6, the linear combination ai X i of the components of a
i=1
normal random vector X = (X 1 , X 2 , . . . , X n ) is a normal random variable. Let us
again emphasize that, while Theorem 5.2.6 tells us that a linear transformation of
jointly normal random variables produces jointly normal random variables, a linear
transformation of random variables which are normal only marginally but not jointly
is not guaranteed to produce normal random variables (Wies and Hall 1993).
As we can see in Examples 5.2.7–5.2.9 below, when {X i }i=1 n
are all normal random
variables but X = (X 1 , X 2 , . . . , X n ) is not a normal random vector, (A) the normal
random variables {X i }i=1
n
are generally not independent even if they are uncorrelated,
(B) the linear combination of {X i }i=1 n
may or may not be a normal random variable,
and (C) the linear transformation of X is not a normal random vector.
Example 5.2.7 (Romano and Siegel 1986) Let X ∼ N (0, 1) and H be the outcome
from a toss of a fair coin. Then, we have Y ∼ N (0, 1) for the random variable
X, H = head,
Y = (5.2.20)
−X, H = tail.
Now,
2 2because 3 } = 0, E{Y } = 0, and E{X Y } = E{E{X Y |H }} =
E{X
1
2
E X + E −X 2 = 0, the random variables X and Y are uncorrelated. How-
ever, X and Y are not independent because, for instance, P(|X | > 1)P(|Y | < 1) > 0
while P(|X | > 1, |Y | < 1) = 0. In addition, X + Y is not normal. In other words,
even when X and Y are both normal random variables, X + Y could be non-normal
if (X, Y ) is not a normal random vector. ♦
354 5 Normal Random Vectors
for a positive number α. Then, X and Y are not independent. In addition, Y is also
a standard normal random variable because, for any set B such that B ∈ B(R), we
have
α
where φ denotes the standard normal pdf. Letting g(α) = 0 x 2 φ(x)d x, we can find
a positive number α0 such that1 g(α0 ) = 41 because g(0) = 0, g(∞) = 21 , and g
is a continuous function. Therefore, when α = α0 , X and Y are uncorrelated from
(5.2.23). Meanwhile, because
2X, |X | ≤ α,
X +Y = (5.2.24)
0, |X | > α,
X + Y is not normal. ♦
Example 5.2.9 (Stoyanov 2013) When X = (X, Y ) is a normal random vector, the
random variables X , Y , and X + Y are all normal. Yet, the converse is not necessarily
true. We now consider an example. Let the joint pdf of X = (X, Y ) be
1 1 2
f X (x, y) = exp − x + y 2
2π 2
2 1 2
× 1 + x y x − y exp − x + y + 2
2 2
, (5.2.25)
2
1 Here, α0 ≈ 1.54.
5.2 Properties 355
2
x y x − y 2 exp − 1 x 2 + y 2 + 2 ≤ 1 (5.2.26)
2
when ≥ −2 + ln 4 ≈ −0.6137 because −4e−2 ≤ x y x 2 − y 2 exp − 21 x 2 +
y 2 ≤ 4e−2 . Then, the joint cf of X can be obtained as
2
s + t2 st s 2 − t 2 s2 + t 2
ϕ X (s, t) = exp − + exp − − , (5.2.27)
2 32 4
Proof Theorem 5.2.7 can be proved from Theorems 4.3.5, 5.2.3, and 5.2.6, or from
(4.3.24) and (4.3.25). Specifically, when X ∼ N (m, K ) with |K | > 0, the eigen-
values of K are {λi }i=1
n
, and the eigenvectorcorresponding to λi is ai , assume the
matrix A and λ̃ = diag √1λ , √1λ , . . . , √1λ considered in (4.3.21) and (4.3.23),
1 2 n
respectively. Then, the mean vector of
Y = λ̃ A (X − m) (5.2.28)
2Here, as we have observed in Exercise 4.62, when the joint cf of X = (X, Y ) is ϕ X (t, s), the cf
of Z = a X + bY is ϕ Z (t) = ϕ X (at, bt).
356 5 Normal Random Vectors
the random vector V = L U − (10 0)T will be a vector of independent stan-
dard normal
/ random variables:
0 / 1 the 0covariance matrix of V is K V = L K U L H =
√ −√
1 1 √ √1
2 −1 10
6 6 6 2 = . ♦
√1
2
√1
2
−1 2 − 1
√ √1
6 2
0 1
In this section, expected values of some non-linear functions and joint moments (Bär
and Dittrich 1971; Baum 1957; Brown 1957; Hajek 1969; Haldane 1942; Holmquist
1988; Kan 2008; Nabeya 1952; Song and Lee 2015; Song et al. 2020; Triantafyl-
lopoulos 2003; Withers 1985) of normal random vectors are investigated. We first
consider a few simple examples based on the cf and mgf.
Let us start with some examples for obtaining joint moments of normal random
vectors via cf and mgf.
Example 5.3.1 For the joint central moment μi j = E (X − m X )i (Y − m Y ) j of a
random vector (X, Y ), we have observed that μ00 = 1, μ01 = μ10 = 0, μ20 = σ12 ,
μ02 = σ22 , and μ11 = ρσ1 σ2 in Sect. 4.3.2.1. In addition,
it is easy
to see that
μ30 = μ03 = 0 and μ40 = 3σ14 when (X, Y ) ∼ N 0, 0, σ12 , σ22 , ρ from (3.3.31).
2 2
that μ231 = 1 σ2 , μ22 = 1 + 2ρ σ1 σ2 ,
3 2
Now, based on the moment theorem, show 3ρσ
and μ41 = μ32 = 0 when (X, Y ) ∼ N 0, 0, σ1 , σ22 , ρ .
5.3 Expected Values of Nonlinear Functions 357
Solution For convenience, let C = E{X Y } = ρσ1 σ2 , A = 21 σ12 s12 + 2Cs1 s2
2
+σ22 s22 , and A(i j) = ∂i j A. Then, we easily have A(10) s1 =0,s2 =0 = σ12 s1
i+ j
3 ∂s ∂s
1 2 2 3
+ Cs2 s1 =0,s2 =0 = 0, A(01) s1 =0,s2 =0 = σ22 s2 + Cs2 s1 =0,s2 =0 = 0, A(20) = σ12 ,
A(11) = C, A(02) = σ22 , and A(i j) = 0 for i + j ≥ 3. Denoting the joint mgf of
(X, Y ) by M = M (s1 , s2 ) = exp(A) and employing the notation M (i j) = ∂ i Mj ,
i+ j
∂s1 ∂s2
2
we get M (10) = M A(10) , M (20) = M A(20) + A(10) ,
2
M (21) = M A(21) + A(20) A(01) + 2 A(11) A(10) + A(10) A(01)
2
= M A(20) A(01) + 2 A(11) A(10) + A(10) A(01) , (5.3.1)
M (31) = M 3A(20) A(11) + A(10) A(01)
2
+ A(10) 3A(11) + A(10) A(01) , (5.3.2)
2 2
M (22) = M A(20) A(02) + 2 A(11) + 4 A(11) A(10) A(01) + A(20) A(01)
2 2 (01) 2
+ A(10) A(02) + A(10) A , (5.3.3)
M (41) = B41 M, (5.3.4)
and
Here,
2 2
B41 = 3 A(20) A(01) + 12 A(20) A(11) A(10) + 6A(20) A(10) A(01)
3 3
+4 A(11) A(10) + A(10) A(01) (5.3.6)
and
2
B32 = 6A(20) A(11) A(01) + 3A(20) A(02) A(10) + 6A(11) A(10) A(01)
2 3 2
+6 A(11) A(10) + A(02) A(10) + 3A(20) A(10) A(01)
3 (01) 2
+ A(10) A . (5.3.7)
358 5 Normal Random Vectors
Recollecting
2 that M(0, 0) = 1, we have μ31 = 3ρσ13 σ2 from3 (5.3.2), μ22 =
1 + 2ρ σ1 σ2 from4 (5.3.3), μ41 = M (41) s1 =0,s2 =0 = 0 from (5.3.4) and (5.3.6),
2 2
and μ32 = M (32) s1 =0,s2 =0 = 0 from (5.3.5) and (5.3.7). ♦
In Exercise 5.15, it is shown that
E X 12 X 22 X 32 = 1 + 2 ρ212 + ρ223 + ρ231 + 8ρ12 ρ23 ρ31 (5.3.8)
E {X 1 X 2 X 3 } = m 1 E {X 2 X 3 } + m 2 E {X 3 X 1 } + m 3 E {X 1 X 2 }
−2m 1 m 2 m 3 (5.3.9)
E {X 1 X 2 X 3 X 4 } = E {X 1 X 2 } E {X 3 X 4 } + E {X 1 X 3 } E {X 2 X 4 }
+ E {X 1 X 4 } E {X 2 X 3 } − 2m 1 m 2 m 3 m 4 (5.3.10)
We now discuss a theorem that is quite useful in evaluating the expected values of
various non-linear functions such as the power functions, sign functions, and absolute
values of normal random vectors.
Denoting the covariance between X i and X j by
ρ̃i j = Ri j − m i m j , (5.3.11)
where Ri j = E X i X j and m i = E {X i }, the correlation coefficient ρi j between
ρ̃
X i and X j and variance σi2 of X i can be expressed as ρi j = √ i j and σi2 = ρ̃ii ,
ρ̃ii ρ̃ j j
respectively.
2 3
Theorem 5.3.1 Let K = ρ̃r s be the covariance matrix of an n-variate normal
random vector X. When {gi (·)}i=1
n
are all memoryless functions, we have
3 More specifically, we have μ31 = M (31) s =0,s =0 = 3M A(20) A(11) s =0,s =0 = 3σ12 C =
1 2 1 2
3ρσ13 σ2 .
2
4 More specifically, we have μ22 = M (22) s =0,s =0 = M A(20) A(02) + 2M A(11)
1 2
|s1 =0,s2 =0 = σ12 σ22 + 2C 2 = 1 + 2ρ2 σ12 σ22 .
5.3 Expected Values of Nonlinear Functions 359
1
n
n
∂ γ1 E gi (X i ) * (γ )
i=1 1
= γ E gi (X i ) ,
3
(5.3.12)
∂ ρ̃rk11s1 ∂ ρ̃rk22s2 · · · ∂ ρ̃rkNN s N 22 i=1
+
N +
N +N
where γ1 = k j , γ2 = k j δr j s j , and γ3 = i j k j . Here, δi j is the Kro-
j=1 j=1 j=1
necker delta function defined as (4.3.17), N ∈ 1, 2, . . . , 21 n(n + 1) , gi(k) (x) = ddx k
k
Based on (5.3.13) and (5.3.14), let us obtain the expected value E {g(X )} for a normal
random variable X .
Example 5.3.2 For a normal random variable X ∼ N m, σ 2 , obtain Υ̃ = E X 3 .
Solution Letting g(x) = x 3 , we have g (2) (x) = 6x. Thus, we get ∂∂ρ̃ Υ̃ =
1
2
E g (2) (X ) = 3E {X } = 3m, i.e., Υ̃ = 3m ρ̃ + c from (5.3.13) with k = 1, where
c is the integration constant. Subsequently, we have
E X 3 = 3mσ 2 + m 3 (5.3.15)
∞
because c = m 3 from Υ̃ → −∞ x 3 δ(x − m)d x = m 3 recollecting (5.3.14). ♦
360 5 Normal Random Vectors
We now derive a general formula for the moment E {X a }. Let us use an underline
as
n−1
, n is odd,
n= 2 (5.3.16)
n
2
, n is even
4n5
to denote the quotient 2
of a non-negative integer n when divided by 2.
Theorem 5.3.2 For X ∼ N (m, ρ̃), we have
.
a
a!
E Xa = ρ̃ j m a−2 j (5.3.17)
j=0
2 j j! (a − 2 j)!
for a = 0, 1, . . ..
When n = 2, specific simpler expressions of (5.3.12) for all possible pairs (n, N )
are shown in Table 5.1. Let us consider the expected value E {g1 (X 1 ) g2 (X 2 )} for
a normal random vector X = (X 1 , X 2 ) with mean vector m = (m 1 , m 2 ) and covari-
ρ̃ ρ̃
ance matrix K = 11 12 assuming n = 2, N = 1, r1 = 1, and s1 = 2 in Theorem
ρ̃12 ρ̃22
5.3.1. Because 11 = 1 and 21 = 1, we can rewrite (5.3.12) as
∂k
E {g1 (X 1 ) g2 (X 2 )} = E g1(k) (X 1 ) g2(k) (X 2 ) . (5.3.19)
∂ ρ̃k12
First, find a value k for which the right-hand side E g1(k) (X 1 ) g2(k) (X 2 ) of (5.3.19)
is simple to evaluate, and then obtain the expected value. Next, integrate the expected
value with respect to ρ̃12 to obtain E {g1 (X 1 ) g2 (X 2 )}. Note that, when ρ̃12 = 0, we
have ρ12 = σρ̃112σ2 = 0 and therefore X 1 and X 2 are independent of each other from
Theorem 5.2.3: this implies, from Theorem 4.3.6, that
E g1(k) (X 1 ) g2(l) (X 2 ) = E g1(k) (X 1 ) E g2(l) (X 2 ) (5.3.20)
ρ̃12 =0
5.3 Expected Values of Nonlinear Functions 361
Table 5.1 Specific formulas of Price’s theorem for all possible pairs (n, N ) when n = 2
N N n N
(n, N ) r j , s j j=1 , δr j s j j=1 , i j i=1 j=1 :
Specific formula of (5.3.12)
(2, 1) (r1 , s1 ) = (1, 1), δr1 s1 = 1, 11 = 2, 21 = 0:
∂k
1 k (2k)
E {g1 (X 1 ) g2 (X 2 )} = 2 E g1 (X 1 ) g2 (X 2 )
∂ ρ̃k11
(2, 1) (r1 , s1 ) = (1, 2), δr1 s1 = 0, 11 = 1,21 = 1:
∂k
E {g1 (X 1 ) g2 (X 2 )} = E g1 (X 1 ) g2(k) (X 2 )
(k)
∂ ρ̃k12
(2, 2) (r1 , s1 ) = (1, 1), (r2 , s2 ) = (1, 2), δr1 s1 = 1, δr2 s2 = 0,
11 = 2, 21 = 0, 12 = 1, 22 = 1:
∂ k1 +k2
k k E {g1 (X 1 ) g2 (X 2 )} =
∂ ρ̃111 ∂ ρ̃122
1 k1
2 E g1(2k1 +k2 ) (X 1 ) g2(k2 ) (X 2 )
(2, 2) (r1 , s1 ) = (1, 1), (r2 , s2 ) = (2, 2), δr1 s1 = 1, δr2 s2 = 1,
11 = 2, 21 = 0, 12 = 0, 22 = 2:
∂ k1 +k2
k k E {g1 (X 1 ) g2 (X 2 )} =
∂ ρ̃111 ∂ ρ̃222
1 k1 +k2 (2k ) (2k )
2 E g1 1 (X 1 ) g2 2 (X 2 )
(2, 3) (r1 , s1 ) = (1, 1), (r2 , s2 ) = (1, 2), (r3 , s3 ) = (2, 2),
δr1 s1 = 1, δr2 s2 = 0, δr3 s3 = 1,
11 = 2, 21 = 0, 12 = 1, 22 = 1, 13 = 0, 23 = 2:
∂ k1 +k2 +k3
k k k E {g1 (X 1 ) g2 (X 2 )} =
∂ ρ̃111 ∂ ρ̃122 ∂ ρ̃223
1 k1 +k3 (2k1 +k2 ) (k +2k3 )
2 E g1 (X 1 ) g2 2 (X 2 )
Example 5.3.4 For a normal random vector X= (X 1 , X 2 ) with mean vector m =
ρ̃ ρ̃
(m 1 , m 2 ) and covariance matrix K = 11 12 , obtain Υ̃ = E X 1 X 22 .
ρ̃12 ρ̃22
d Υ̃
Solution With k = 1, g1 (x) = x, and g2 (x) = x 2 in (5.3.19), we get d ρ̃12
=
E g1(1) (X 1 ) g2(1)
(X 2 ) = E {2X 2 } = 2m 2 , i.e., Υ̃ = 2m 2 ρ̃12 + c. Recollecting
(5.3.20), we have c = Υ̃ = m 1 ρ̃22 + m 22 . Thus, we finally have
ρ̃12 =0
E X 1 X 22 = 2m 2 ρ̃12 + m 1 ρ̃22 + m 22 . (5.3.21)
362 5 Normal Random Vectors
The result (5.3.21) is the same as the result E W Z 2 = 2m 2 ρσ1 σ2 + m 1 σ22 +
m 22 for a random vector (W, Z ) = (σ1 X + m 1 , σ2 Y + m 2 ) which we would obtain
after some steps based on E X Y 2 = 0 for (X, Y ) ∼ N (0, 0, 1, 1, ρ). In addition,
when X 1 = X 2 = X , (5.3.21) is the same as (5.3.15). ♦
a b
A general formula for the joint moment E X 1 X 2 is shown in the theorem below.
Theorem 5.3.3 The joint moment E X 1a X 2b can be expressed as
a− j b− j
. ..
min(a,b) j p q
a!b! ρ̃12 ρ̃11 ρ̃22 m 1 m2
a− j−2 p b− j−2q
E X 1a X 2b = (5.3.22)
j=0 p=0 q=0
2 p+q j! p!q!(a − j − 2 p)!(b − j − 2q)!
for (X 1 , X 2 ) ∼ N m 1 , m 2 , ρ̃11 , ρ̃22 , √ρ̃12 , where a, b = 0, 1, . . ..
ρ̃1 ρ̃22
+ +j 3−
2 2− +j j p q 2− j−2 p
12ρ̃12 ρ̃11 ρ̃22 m 1
Example 5.3.5 We can obtain E X 12 X 23 = 2 p+q j! p!q! (2− j−2 p)!
j=0 p=0 q=0
3− j−2q
m2
(3− j−2q)!
, i.e.,
E X 12 X 23 = m 21 m 32 + 3ρ̃22 m 21 m 2 + ρ̃11 m 32 + 6ρ̃12 m 1 m 22
+3ρ̃11 ρ̃22 m 2 + 6ρ̃12 ρ̃22 m 1 + 6ρ̃212 m 2 (5.3.23)
from (5.3.22). ♦
Theorem 5.3.4 For (X 1 , X 2 ) ∼ N m 1 , m 2 , σ12 , σ22 , ρ , the joint central moment
μab = E (X 1 − m 1 )a (X 2 − m 2 )b can be obtained as (Johnson and Kotz 1972;
Mills 2001; Patel and Read 1996)
⎧
⎨ 0, a + b is odd,
μab = +
t
(2ρσ1 σ2 )2 j+ξ (5.3.24)
a!b!
⎩ 2g+h+ξ (g− j)!(h− j)!(2 j+ξ)!
, a + b is even
j=0
where g and h are the quotients of a and b, respectively, when divided by 2; ξ is the
residue when a or b is divided by 2; and t = min(g, h).
5.3 Expected Values of Nonlinear Functions 363
Example 5.3.6 When a = 2g, b = 2h, and m 1 = m 2 = 0, all the terms except for
those satisfying a − j − 2 p = 0 and b − j − 2q = 0 will be zero in (5.3.22), and
thus we have
.
min(a,b)
a!b! j
E X 1a X 2b = ρ̃12 m 01 m 02
j=0,2,... 2
g+h− j j! 2 ! 2 !0!0!
a− j b− j
.
min(g,h)
a!b!
= (2ρ̃12 )2 j , (5.3.26)
j=0
2g+h (2 j)!(g − j)!(h − j)!
which is the same as the second line in the right-hand side of (5.3.24). Similarly,
when a = 2g + 1, b = 2h + 1, and m 1 = m 2 = 0, the result (5.3.22) is the same as
the second line in the right-hand side of (5.3.24). ♦
Solution First, note that ddx g(x) = ddx sgn(x) = 2δ(x) and that E {δ (X 1 ) δ (X 2 )}
= f (0, 0), where f denotes the pdf of N 0, 0, σ12 , σ22 , ρ . Letting k = 1 in (5.3.19),
we have d Υ̃
d ρ̃
= E g1(1) (X 1 ) g2(1) (X 2 ) = 4 f (0, 0) = √
2
, i.e.,
πσ1 σ2 1−ρ2
d Υ̃ 2 1
= (5.3.27)
dρ π 1 − ρ2
2
E {sgn (X 1 ) sgn (X 2 )} = sin−1 ρ + c. (5.3.28)
π
2
E {sgn (X 1 ) sgn (X 2 )} = sin−1 ρ. (5.3.29)
π
2 3
5 Here, the range of sin−1 x is set as − π2 , π2 .
364 5 Normal Random Vectors
Table 5.2 Expected value E {g1 (X 1 ) g2 (X 2 )} for some non-linear functions g1 and g2 of
(X 1 , X 2 ) ∼ N (0, 0, 1, 1, ρ)
g1 (X 1 )
X1 |X 1 | sgn (X 1 ) δ (X 1 )
g2 (X 2 ) X2 ρ 0 π ρ
2
0
|X 2 | 0 2
ρ sin −1 ρ + 1 − ρ2 0 1
1 − ρ2
π π
−1 ρ
sgn (X 2 ) π ρ
2 2
0 π sin 0
δ (X 2 ) 0 π 1−ρ
1 2 0 √1
2π 1−ρ2
Note. dρ d
ρ sin−1 ρ + 1 − ρ2 = sin−1 ρ. dρ d
sin−1 ρ = √ 1 2 .
1−ρ
E {|X i |} = π . E {δ (X i )} =
2 √1
.
2π
and the absolute moment νr s = E X 1r X 2s is
r +s
2 2 r +1 s+1 1 1 1 2
νr s = Γ Γ 2 F1 − r, − s; ; ρ . (5.3.31)
π 2 2 2 2 2
+3 k j δr j s j
∂ k1 +k2 +k3 1 j=1
k1 k2 k3
Υ̃ = E g1( 11 k1 + 12 k2 + 13 k3 ) (X 1 )
∂ ρ̃12 ∂ ρ̃23 ∂ ρ̃31 2
× g2( 21 k1 + 22 k2 + 23 k3 ) (X 2 ) g3( 31 k1 + 32 k2 + 33 k3 ) (X 3 )
( j)
E g1 (X 1 ) g2(k) (X 2 ) g3(l) (X 3 )
ρ̃12 =ρ̃23 =ρ̃31 =0
( j)
= E g1 (X 1 ) E g2(k) (X 2 ) E g3(l) (X 3 ) (5.3.33)
and
( j)
E g1 (X 1 ) g2(k) (X 2 ) g3(l) (X 3 )
ρ̃31 =ρ̃12 =0
( j)
= E g1 (X 1 ) E g2(k) (X 2 ) g3(l) (X 3 ) (5.3.34)
Consider a standard tri-variate normal random vector X ∼ N (0, K 3 ) and its pdf
f 3 (x, y, z) as described in Sect. 5.1.3. For the partial moment
∞ ∞ ∞
[r, s, t] = x r y s z t f 3 (x, y, z)d xd ydz (5.3.36)
0 0 0
1 .
c
π −1
[1, 1, 1] = √ |K 3 | + ρi j + ρ jk ρki + sin βi j,k , (5.3.41)
8π 3 2
2
ν211 = (ρ23 + 2ρ12 ρ31 ) sin−1 ρ23 + 1 + ρ212 + ρ231 1 − ρ223 , (5.3.42)
π
and
6The last term |K 3 | ρ23 1 − ρ223 of [2, 0, 0] and the last two terms ρ23 1 − ρ223 + ρ31 1 − ρ231
of [1, 1, 0] given in (Johnson andKotz 1972, Kamat 1958) some references should be corrected
|K 3 |ρ23
into as in (5.3.39) and ρ23 1 − ρ231 + ρ31 1 − ρ223 as in (5.3.40), respectively.
1−ρ223
5.3 Expected Values of Nonlinear Functions 367
2
ν221 = 1 + 2ρ212 + ρ223 + ρ231 + 4ρ12 ρ23 ρ31 − ρ223 ρ231 . (5.3.43)
π
+
c
In (5.3.40) and (5.3.41), the symbol denotes the cyclic sum: for example, we have
.
c
sin−1 ρi j = sin−1 ρ12 + sin−1 ρ23 + sin−1 ρ31 . (5.3.44)
We will have 2+3−1 3
= 4, 3+3−13
= 10, 4+3−1 3
= 20 different7 cases, respec-
tively, of the expected value E {g1 (X 1 ) g2 (X 2 ) g3 (X 3 )} for two, three, and
four options as the function gi . For a standard tri-variate normal random vec-
tor X = (X 1 , X 2 , X 3 ), consider four functions {x, |x|, sgn(x), δ(x)} of gi (x).
Among the 20 expected values, due to the symmetry of the standard normal
distribution, the four expected values E {X 1 X 2 sgn (X 3 )}, E {X 1 sgn (X 2 ) sgn (X 3 )},
E {sgn (X 1 ) sgn (X 2 ) sgn (X 3 )}, E {X 1 X 2 X 3 } of products of three odd functions
and the six expected values E {X 1 δ (X 2 ) δ (X 3 )}, E {sgn (X 1 ) |X 2 | |X 3 |}, E {sgn
(X 1 ) |X 2 | δ (X 3 )}, E {sgn (X 1 ) δ (X 2 ) δ (X 3 )}, E {X 1 |X 2 | |X 3 |}, E {X 1 |X 2 | δ
(X 3 )} of products of two odd functions and one even function are zero.
In addition, we easily get E {δ (X 1 ) δ (X 2 ) δ (X 3 )} = f 3 (0, 0, 0), i.e.,
1
E {δ (X 1 ) δ (X 2 ) δ (X 3 )} = (5.3.45)
8π 3 |K 3 |
based on (5.1.19), and the nine remaining expected values are considered in
Exercises 5.21–5.23. The results of these ten expected values are summarized
in Table 5.3. Meanwhile, some results shown in Table 5.3 can be verified using
(5.3.38)–(5.3.43): for instance, we can reconfirm E {X 1 |X 2 | sgn (X 3 )} = 21 ∂ν 211
= π2
∂ρ31
ρ12 sin−1 ρ23 + ρ31 1 − ρ223 via ν211 shown in (5.3.42) and E {X 1 X 2 |X 3 |}
= 14 ∂ρ∂12 ν221 = π2 (ρ12 + ρ23 ρ31 ) via ν221 shown in (5.3.43).
a = {a1 , a2 , . . . , an } (5.3.46)
Table 5.3 Expected values E {g1 (X 1 ) g2 (X 2 ) g3 (X 3 )} of some products for a standard tri-variate
normal random vector (X 1 , X 2 , X 3 )
E {δ (X 1 ) δ (X 2 ) δ (X 3 )} = √ 13
8π |K 3 |
l = l11 , l12 , . . . , l1n , l22 , l23 , . . . , l2n , . . . , ln−1,n−1 , ln−1,n , lnn (5.3.47)
of l, where
.
n
L a,k = ak − lkk − l jk (5.3.49)
j=1
for k = 1, 2, . . . , n and
l ji = li j (5.3.50)
for j > i. A general formula for the joint moments of normal random vectors can
now be obtained as shown in the following theorem:
2 3
Theorem 5.3.5 For X ∼ N (m, K ) with m = (m 1 , m 2 , . . . , m n ) and K = ρ̃i j , we
have
n ⎛ ⎞⎛ ⎞
* a . *n *
n *
n
da,l ⎝ ρ̃i j ⎠ ⎝ m j ⎠ ,
l L
E Xkk = i j a, j
(5.3.51)
k=1 l∈Sa i=1 j=i j=1
5.3 Expected Values of Nonlinear Functions 369
where
/ n 0⎛ ⎞−1 ⎛ ⎞−1
* *
n *
n *
n
da,l = 2−Ml ak ! ⎝ l i j !⎠ ⎝ L a, j !⎠ (5.3.52)
k=1 i=1 j=i j=1
+
n
with Ml = lii .
i=1
Note that, when any of the 21 n(n + 1) elements of l or any of the n elements of
/ 0−1 / 0−1
n 1
n 1 n 1n
L a, j j=1 is a negative integer, we have li j ! L a, j ! = 0 because
i=1 j=i j=1
+
(−k)! → ±∞ for k = 1, 2, . . .. Therefore, the collection Sa in of (5.3.51) can
l∈Sa
be replaced with the collection of all sets of 21 n(n + 1) integers. Details for obtaining
E X 1 X 2 X 32 based on Theorem 5.3.5 as an example are shown in Table 5.4 in the
case of a = {1, 1, 2}.
and
3
Table 5.4 Element sets l = {l11 , l12 , l13 , l22 , l23 , l33 } of Sa , L a, j j=1 , coefficient da,l , and the
terms in E X 1 X 2 X 3 for each of the seven element sets when a = {1, 1, 2}
2
{l11 , l12 , l13 , l22 , l23 , l33 } L a,1 , L a,2 , L a,3 da,l Terms
1 {0, 0, 0, 0, 0, 0} {1, 1, 2} 1 m 1 m 2 m 23
2 {0, 0, 0, 0, 1, 0} {1, 0, 1} 2 2ρ̃23 m 1 m 3
3 {0, 0, 1, 0, 0, 0} {0, 1, 1} 2 2ρ̃13 m 2 m 3
4 {0, 0, 1, 0, 1, 0} {0, 0, 0} 2 2ρ̃13 ρ̃23
5 {0, 1, 0, 0, 0, 0} {0, 0, 2} 1 ρ̃12 m 23
6 {0, 0, 0, 0, 0, 1} {1, 1, 0} 1 ρ̃33 m 1 m 2
7 {0, 1, 0, 0, 0, 1} {0, 0, 0} 1 ρ̃12 ρ̃33
370 5 Normal Random Vectors
Note that (5.3.53) is the same as (5.3.9) and (5.3.35), and (5.3.54) is the same as
(5.3.10). ♦
When the mean vector is 0 in Theorem 5.3.5, we have
n / n 0 ⎛ ⎞
* a * . * n * n
ρ̃
li j
2−Ml ⎝ ⎠,
i j
E Xkk = ak ! (5.3.55)
k=1 k=1 i=1 j=i
l i j !
l∈Ta
+
n +
n
where Ta denotes the collection of l such that l11 + lk1 = a1 , l22 + lk2 = a2 ,
k=1 k=1
+
n n n
. . ., lnn + lkn = an , and li j ≥ 0 j=i i=1
. In other words, Ta is the same as Sa
k=1
with L a,k ≥ 0 replaced by L a,k = 0 in (5.3.48).
+
n
Theorem 5.3.6 We have E X 1a1 X 2a2 · · · X nan = 0 for X ∼ N (0, K ) when ak is
k=1
an odd number.
+
n +
n
Proof Adding lkk + lk j = ak for k = 1, 2, . . . , n, we have ak = M l +
j=1 k=1
+
n +
n +
n−1 +
n +n
li j = 2Ml + 2 li j , which is an even number. Thus, when ak is
i=1 j=1 i=1 j=i+1
a1 a2 k=1
an odd number, the collection Ta is a null set and E X 1 X 2 . . . X nan = 0. ♠
Example 5.3.10 For a zero-mean n-variate normal random vector, assume a =
1 = {1, 1, . . . , 1}. When n is an odd number, E {X 1 X 2 · · · X n } = 0 from Theorem
5.3.6. Next, assume n is even. Over a non-negative integer region, if lkk = 0 for
k = 1, 2, . . . , n and one of {l1k , l2k , . . . , lnk } − {lkk } is 1 and all the others are 0, we
+ n
lik = 1. Now, if l ∈ Ta , because da,l = d1,l = 1n(1!)
n
have lkk + = 1, we
i=1 20 1! (0!)n
i=1
have (Isserlis 1918)
.*
n *
n
l
E {X 1 X 2 · · · X n } = ρ̃iijj . (5.3.56)
l∈Ta i=1 j=i
Next, assigning 0, 1, and 0 to lkk , one of {l1k , l2k , . . . , lnk } − {lkk }, and all the others
of {l1k , l2k , . . . , lnk } − {lkk }, respectively, for k = 1, 2, . . . , n is the same as assigning
1 to each pair after dividing {1, 2, . . . , n} into n pairs of two numbers. Here, a pair
( j, k) represents the subscript of l jk . Now, recollecting that there are n! possibilities
for the same choice with a different order, the number of ways to divide {1, 2, . . . , n}
−1
into n pairs of two numbers is n2 n−2 2
· · · 22 n! = 2n! n n! . In short, the number of
elements in Ta , i.e., the number of non-zero terms on the right-hand side of (5.3.56),
n n! = (2n − 1)!!.
is 2n! ♦
5.4 Distributions of Statistics 371
Often, the terms sample and random sample (Abramowitz and Stegun 1972; Grad-
shteyn and Ryzhik 1980) are used to denote an i.i.d random vector, especially
in statistics. A function of a sample is called a statistic. With a sample8 X =
(X 1 , X 2 , . . . , X n ) of size n, the mean E {X i } and the variance Var (X i ) of the com-
ponent random variable X i are called the population mean and population variance,
respectively. Unless stated otherwise, we assume that population mean and popula-
σ , respectively,
2
tion variance are m and for the samples considered in this section.
We also denote by E (X i − m)k = μk the k-th population central moment of X i
for k = 0, 1, . . ..
1.
n
Xn = Xi (5.4.1)
n i=1
and variance
σ2
Var X n = (5.4.3)
n
8In several fields including engineering, the term sample is often used to denote an element X i of
X = (X 1 , X 2 , . . . , X n ).
372 5 Normal Random Vectors
⎡ ⎤
⎢+
n +
n +
n ⎥
1 ⎢ E X i2 + E Xi X j ⎥ 2 1 n σ 2 + m 2 + n(n − 1)m 2 − m 2 .
n2 ⎣ ⎦ − m = n2
i=1 i=1 j=1
i= j
♠
Example 5.4.1 (Rohatgi and Saleh 2001) Obtain the third central moment μ3 X n
of the sample mean X n .
n
3 +
Solution The third central moment μ3 X n = E X n − m = n3 E
1
3 i=1
(X i − m) of X = (X 1 , X 2 , . . . , X n ) can be expressed as
1 . 1 ..
n n n
μ3 X n = 3 E (X i − m)3 + 3 E (X i − m)2 X j − m
n i=1 n i=1 j=1
i= j
1 ...
n n n
+ 3 E (X i − m) X j − m (X k − m) . (5.4.4)
n i=1 j=1 k=1
i= j, j=k,k=i
Now, noting that E {X i − m} =0 and that X i and X j are independent of each
other for i = j, we have E (Xi − m)2 X j − m = E (X i − m)2 E X j −
m = 0 for i = j and E (X i − m) X j − m (X k − m) = E {X i − m}
E X j − m E {X k − m} = 0 for i = j, j = k, and k = i. Thus, we have
μ3
μ3 X n = 2 (5.4.5)
n
+
n
from μ3 X n = 1
n3
E (X i − m)3 . ♦
i=1
1 . 2
n
Wn = Xi − X n (5.4.6)
n − 1 i=1
E {Wn } = σ 2 , (5.4.7)
i.e., the expected value of sample variance is equal to the population variance.
5.4 Distributions of Statistics 373
Proof Let Yi = X i − m. Then, we have E {Yi } = 0, E Yi2 = σ 2 = μ2 , and
+
n +
n
E Yi4 = μ4 . Next, letting Y = n1 Yi = n1 (X i − m), we have
i=1 i=1
.
n
2 .
n
2
Xi − X = Yi − Y (5.4.8)
i=1 i=1
n
+ 2 n
+ 2 +
n n
+ 2
from Xi − X = Xi − m − X − m = Yi − n1 Xk − m . In
i=1 i=1 i=1 k=1
n
+ 2 n
+ 2
+
n +
n
addition, because Yi − Y = Yi2 − 2Y Yi + Y = Yi2 − 2Y Yi +
i=1 i=1 i=1 i=1
2 +
n 2
nY = Yi2 − nY , we have
i=1
.
n
2 .
n
2
Xi − X = Yi2 − nY . (5.4.9)
i=1 i=1
n
+ 2 +n 2
Therefore, we have E Xi − X = E Yi2 − nE Y = nσ 2 −
i=1 i=1
+
n + n 2 3
n
n2
E Yi Y j = nσ 2 − n1 nE Yi2 + 0 = (n − 1)σ 2 and E {Wn } =
i=1 j=1
n
1 +
2
E n−1 Xi − X = σ2 . ♠
i=1
Note that, due to the factor n − 1 instead of n in the denominator of (5.4.6), the
expected value of sample variance is equal to the population variance as shown in
(5.4.7).
μ4 (n − 3)μ22
Var {Wn } = − (5.4.10)
n n(n − 1)
2
+
n 2 +
n +
n
Proof Letting Yi = X i − m, we have E Yi2 − nY =E Yi2
i=1 i=1 j=1
2+
n 4
Y j2 − 2nY Yi2 + n 2 Y , i.e.,
i=1
374 5 Normal Random Vectors
⎧/ 02 ⎫ ⎧ ⎫
⎨ .n ⎬ ⎨.n .
n .
n ⎬
2 2 4
E Yi2 − nY = E Yi2 Y j2 − 2nY Yi2 + n 2 Y
⎩ ⎭ ⎩ ⎭
i=1 i=1 j=1 i=1
2 .
n
4
= nμ4 + n(n − 1)μ22 − 2nE Y Yi2 + n2E Y . (5.4.11)
i=1
2 +
n +
n +
n +
n +
n +
n +
n
In (5.4.11), E Y Yi2 = 1
n2
E Yi2 Y j Yk = 1
n2
E Yi4 + Yi2 Y j2 can
i=1 i=1 j=1 k=1 i=1 i=1 j=1
i= j
be evaluated as
2 .
n
1
E Y Yi2 = μ4 + (n − 1)μ22 (5.4.12)
i=1
n
4 +
n +
n +
n +
n +
n +
n +
n
and E Y = 1
n4
E Yi Y j Yk Yl = 1
n4
E Yi4 +3 Yi2 Y j2 can
i=1 j=1 k=1 l=1 i=1 i=1 j=1
i= j
be obtained as9
4 1
E Y = μ4 + 3(n − 1)μ22 . (5.4.13)
n3
Next, recollecting
(5.4.9), (5.4.12),
and (5.4.13), if
we rewrite (5.4.11),
n
)
+ 2 2 +
n 2
2
we have E Xi − X =E Yi2 − nY = nμ4 + n(n − 1)μ22 −
i=1 i=1
n2
2n
n
μ4 + (n − 1)μ22 + n3
μ4 + 3(n − 1)μ22 , i.e.,
⎡ 2 ⎤
.
n
2
(n − 1) 2 (n − 1) n 2 − 2n + 3 2
E⎣ Xi − X ⎦= μ4 + μ2 . (5.4.14)
i=1
n n
2 )
n
+ 2
We get (5.4.10) from Var {Wn } = 1
(n−1)2
E Xi − X − μ22 using (5.4.7)
i=1
and (5.4.14). ♠
9In this formula, the factor 3 results from the three distinct cases of i = j = k = l, i = k = j = l,
and i = l = k = j.
5.4 Distributions of Statistics 375
and consequently
√
n
X n − μ ∼ N (0, 1) (5.4.16)
σ
for a sample X = (X 1 , X 2 , . . . , X n ) from N μ, σ 2 .
Proof Recollecting the mgf M(t) = exp μt + 21 σ 2 t 2 of N μ, σ 2 and
n
using (5.E.23), the mgf of X n can be obtained as M X n (t) = M nt =
2
1 √σ
exp μt + 2 n
t 2 . Thus, we have (5.4.15), and (5.4.16) follows. ♠
Theorem 5.4.4
can also √
be shown from Theorem 5.2.5. More generally, we
σ2
n
√
have X n ∼ N μ, n and n X n − μ ∼ N (0, 1) when X i ∼ N μi , σi2 i=1
σ2
+
n +
n
are independent of each other, where μ = 1
n
μi and σ 2 = 1
n
σi2 .
i=1 i=1
Theorem 5.4.5 The sample mean and sample variance of a normal sample are
independent of each other.
Proof We first show that A = X n and B = (V1 , V2 , . . . , Vn ) = X 1 − X n , X 2
−X n , . . . , X n − X n are independent of each other. Letting t = (t, t1 , t2 , . . . ,
tn ), the
joint mgf M A,B t
= E exp t X n + t 1 V1 + t 2 V2 + · · · + t n Vn =
+n
E exp t X n + ti X i − X n of A and B is obtained as
i=1
/ n 0 / n 0 )
. .
M A,B t = E exp ti X i − ti − t X n
i=1 i=1
⎧
⎡ ⎛ ⎞ ⎫⎤
⎨. n
1 .n ⎬
= E ⎣exp ⎝nti + t − t j ⎠ Xi ⎦ . (5.4.17)
⎩ n ⎭
i=1 j=1
+
n
Letting t = 1
n
ti , the joint mgf (5.4.17) can be expressed as M A,B t =
i=1
2
1
n
t+nti −nt 1
n 1
n
exp μ t+ntni −nt + σ2 t+ntni −nt
2
E exp X i n = = exp
i=1 i=1 i=1
1 n
σ2
exp μt σ2 t 2
= exp μt + σ2nt
2 2
μ ti − t + 2n 2 t −t 2
2 2nt ti − t + n i n + 2n 2
i=1
1
n 2 1n
σ2 σ2 t
exp 2 ti − t exp μ + n ti − t , or as
i=1 i=1
376 5 Normal Random Vectors
σ2 . 2
n
σ2 t 2
M A,B t = exp μt + exp ti − t
2n 2 i=1
n
σ2 t .
× exp μ+ ti − t . (5.4.18)
n i=1
n
+ +n
Noting that ti − t = ti − nt = 0, we eventually have
i=1 i=1
σ2 . 2
n
σ2 t 2
M A,B t = exp μt + exp ti − t . (5.4.19)
2n 2 i=1
Meanwhile, because A = X n ∼ N μ, σn as we have observed in Theorem 5.4.4,
2
the mgf of A is
σ2 2
M A (t) = exp μt + t . (5.4.20)
2n
+ n
Recollecting ti − t = 0, the mgf of B = (V1 , V2 , . . . , Vn ) can be obtained
i=1 n
+
as M B (t) = E {exp (t1 V1 + t2 V2 + · · · + tn Vn )} = E exp ti X i − X n
n i=1
2 + +n 1
n 2 3 1n
= E exp ti X i − t Xi = E exp X i ti − t = exp
i=1 i=1 n i=1 i=1
2 + 2 +
n 2
μ ti − t + σ2 (ti − t ti − t + σ2
2
= exp μ ti − t or, equivalently,
i=1 i=1
as
σ2 . 2
n
M B (t) = exp ti − t , (5.4.21)
2 i=1
1 r
n r 2 −1 exp − u(r )
n
f (r ) = n (5.4.22)
2 Γ 2
2 2
is called the (central) chi-square pdf with its distribution denoted by χ2 (n), where n
is called the degree of freedom.
The central chi-square pdf (5.4.22), an exampleof which is shown in Fig. 5.4, is
α−1
the same as the gamma pdf f (r ) = β α Γ (α) r
1
exp − β u(r ) introduced in (2.5.31)
r
with α and β replaced by 2 and 2, respectively: in other words, χ2 (n) = G n2 , 2 .
n
Theorem 5.4.6 The square of a standard normal random variable is a χ2 (1) random
variable.
1
MY (t) = (1 − 2t)− 2 , t <
n
(5.4.23)
2
and moments
Γ k + n2
E Y k
= 2k
(5.4.24)
Γ n2
0 r
378 5 Normal Random Vectors
Proof Using (5.4.22), the mgf MY (t) = E etY can be obtained as
∞
1 n
−1 1 − 2t
MY (t) = n x 2 exp − x d x. (5.4.25)
0 2 2 Γ n2 2
(2)
k
can easily be obtained as E Y k = dtd k MY (t) = (−2)k − n2 − n2 − 1 · · ·
n t=0
− 2 − (k − 1) = 2k n2 n2 + 1 · · · n2 + k − 1 , resulting in (5.4.24). ♠
Example 5.4.2 For Y ∼ χ2 (n), we have the expected value E{Y } = n and variance
Var(Y ) = 2n from (5.4.24). ♦
Example 5.4.3 (Rohatgi and Saleh 2001) For X n ∼ χ2 (n), obtain the limit distri-
butions of Yn = Xn 2n and Z n = Xnn .
n→∞
t n2 t
2t − 2t · n
t
lim M X n n 2 = lim 1 − n 2 = lim exp n = 1 and, consequently,
n→∞ n→∞ n→∞
− n
·t
P (Yn = 0) → 1. Similarly, we get lim M Z n (t) = lim 1 − 2tn 2t = et and,
n→∞ n→∞
consequently, P (Z n = 1) → 1. ♦
n +
n
When X i ∼ χ2 (ki ) i=1
are independent of each other, the mgf of Sn = Xi
i=1
+
n
1
n
−
ki − 21 ki
can be obtained as M Sn (t) = (1 − 2t) 2 = (1 − 2t) i=1 based on the mgf
i=1
shown in (5.4.23) of X i . This result proves the following theorem:
n
Theorem 5.4.8 When X i ∼ χ2 (ki ) i=1 are independent of each other, we have
+n +n n
X i ∼ χ2 ki . In addition, if X i ∼ N μi , σi2 i=1 are independent of each
i=1 i=1
+n 2
X i −μi
other, then σi
∼ χ2 (n).
i=1
√
Recollecting Γ 21 = π shown in (1.4.83), it is easy to see that (5.4.26) with δ = 0
is the same as (5.4.22): in other words, χ2 (n, 0) is χ2 (n). In Exercise 5.32, it is shown
that
E{Y } = n + δ, (5.4.27)
σY2 = 2n + 4δ, (5.4.28)
and
δt 1
MY (t) = (1 − 2t)− 2 exp
n
, t< (5.4.29)
1 − 2t 2
are the mean, variance, and mgf, respectively, for Y ∼ χ2 (n, δ).
n
Theorem 5.4.9 If X i ∼ χ2 (ki , δi ) i=1 are independent of each other, then Sn =
+n +
n +
n
X i ∼ χ2 ki , δi .
i=1 i=1 i=1
Proof From the mgf shown in (5.4.29) of X i , the mgf of Sn can be obtained as
+n
1n k
− 2i
− 21 ki
t +
n
M Sn (t) = (1 − 2t) exp 1−2t = (1 − 2t)
tδi i=1 exp 1−2t δi . In other
i=1 i=1
+
n +n +
n
words, X i ∼ χ2 ki , δi . ♠
i=1 i=1 i=1
5.4.3 t Distribution
f (r ) = n √
2
1+ (5.4.30)
Γ 2 nπ n
is called the central t pdf with the corresponding distribution denoted as t (n), where
the natural number n is called the degree of freedom.
380 5 Normal Random Vectors
0 r
The central t pdf with the degree of freedom of 1 is a Cauchy pdf: in other words,
t (1) = C(0, 1). Figure 5.5 shows an example of the central t pdf.
53 a
Example 5.4.4 When f (v) = is a pdf, obtain the value of a.
( 2 )3
5+v
∞ ∞
2 −3
Solution From −∞ f (v)dv = 53 a −∞ 5 + v dv = 1, we get a = √8 .
3 5π
53 a 53 Γ (3)
Alternatively, comparing (5.4.30) and f (v) = , we have 53 a = √ , i.e.,
Γ ( 25 ) 5π
(5+v2 )3
a= 3√
2√
= √8 . ♦
4 π 5π 3 5π
√ X
n √ ∼ t (n) (5.4.31)
Y
√
Proof Let T = n √XY and W = Y . Then, X = T Wn and the Jacobian of the
inverse transformation (X, Y ) = g −1 (T, W ) = T Wn , W is J g −1 (t, w) =
w √1 √t
∂
∂(t,w) g (t, w) = n n 2 w = wn . Thus, the joint pdf of (T, W ) can be
−1
0 1
w 2 n −1
obtained as f T,W (t, w) = 2πn exp − t2nw n2w 2 n exp − w2 u(w), i.e.,
2 Γ(2)
n−1
w 2 w t2
f T,W (t, w) = √ n exp − 1+ u(w). (5.4.32)
2πn2 2 Γ n2 2 n
5.4 Distributions of Statistics 381
w t2 2
Next, letting 2
1+= v, we have 1 + tn dw = 2dv. Thus, the pdf of T
n
∞ n+1
2 − 2 ∞ n−1
can be obtained as f T (t) = −∞ f T,W (t, w)dw = √πnΓ1 n 1 + tn 0 v
2
(2)
−v
e dv, or as
− n+1
Γ n+1 t2 2
f T (t) = n √
2
1+ , (5.4.33)
Γ 2 nπ n
n−1
Wn ∼ χ2 (n − 1) (5.4.34)
σ2
and
√ Xn − μ
n √ ∼ t (n − 1) (5.4.35)
Wn
n
+
Proof Recollecting that X i − X n = (X i − μ) − X n − μ and (X i − μ) X n − μ =
i=1
2 n
+ 2
n X n − μ , we can rewrite the sample variance as Wn = 1
n−1 Xi − X n =
i=1
+
n 2 +
n 2
1
n−1 (X i − μ)2 − 2 (X i − μ) X n − μ + X n − μ = 1
n−1 (X i − μ)2 − n X n − μ .
i=1 i=1
Thus, we have
.n
Xi − μ 2 n−1 n 2
= Wn + 2 X n − μ . (5.4.36)
i=1
σ σ 2 σ
Now, we have
n 2
X n − μ ∼ χ2 (1) (5.4.37)
σ2
.n
Xi − μ 2
∼ χ2 (n) (5.4.38)
i=1
σ
2
as observed in Theorem 5.4.8. Recollecting that n−1
σ2
Wn and σn2 X n − μ are inde-
pendent of each other as discussed in Theorem 5.4.5, the mgf of the statistic
+n 2
X i −μ
σ
in (5.4.36) can be obtained as
i=1
n ) 2 )
. Xi − μ 2 (n − 1)Wn n Xn − μ
E exp t = E exp t +t
i=1
σ σ2 σ2
2 )
(n − 1)Wn n Xn − μ
= E exp t E exp t . (5.4.39)
σ2 σ2
n−1
Wn ∼ χ2 (n − 1). (5.4.41)
σ2
√ √
n ( X n −μ) − 21 √
Next, the distribution of n−1× σ
(n−1)Wn
σ2
= n X√nW−μ is t (n
n
− 1) from Theorem 5.4.10. ♠
√
7( X 7 −1)
Example 5.4.6 For a sample from N 1, σ 2 , it is easy to see that T = √
W7
∼
t (6) from Theorem 5.4.11. ♦
√ X
Z = n√ (5.4.42)
Y
is called the non-central t distribution with the degree of freedom of n and non-
centrality parameter δ = σμ , and is denoted by t (n, δ).
2 δ 2x 2
f (x) = √ n+1 n . (5.4.43)
π n + x 2 2 j=0 Γ 2 j! n + x2
Comparing the pdf’s (5.4.30) and (5.4.43), it is easy to see that the non-central t
distribution t (n, 0) is the same as the central t distribution t (n). In Exercise 5.34, we
obtain the mean
Γ n−1 n
E{Z } = δ n
2
, n>1 (5.4.44)
Γ 2 2
and variance
n−1 2
n(1 + δ 2 ) nδ 2 Γ
Var{Z } = − n2 , n>2 (5.4.45)
n−2 2 Γ 2
of Z ∼ t (n, δ).
−2ρ + , (5.4.46)
σ1 σ2 σ2
s2 2 3
E X 2 Y = 1 − ρ2 n + 1 + (n − 2)ρ2 Y 2 (5.4.47)
(n − 1)2
for (X, Y ) ∼ t 0, 0, s 2 , 1, ρ, n .
384 5 Normal Random Vectors
5.4.4 F Distribution
is called the central F pdf with the degree of freedom of (m, n) and its distribution
is denoted by F(m, n).
The F distribution, together with the chi-square and t distributions, plays an
important role in mathematical statistics. In Exercise 5.35, it is shown that the moment
of H ∼ F(m, n) is
n k Γ m + k Γ n − k
E H k
= 2
2 (5.4.49)
m Γ m2 Γ n2
9n:
for k = 1, 2, . . . , 2
− 1. Figure 5.6 shows the pdf of F(4, 3).
Theorem 5.4.12 (Rohatgi and Saleh 2001) We have
nX
∼ F(m, n) (5.4.50)
mY
Proof Let H = mY nX
. Assuming the auxiliary variable V = Y , we have X =
m
H V and Y = V . Because the Jacobian of the inverse transformation (X, Y ) =
n
m m
v 0
g −1 (H, V ) = mn H V, V is J g −1 (r, v) = ∂(r,v) ∂
g −1 (r, v) = mn = v, we
n
r 1
n
have the joint pdf f H,V (r, v) = mv f
n X,Y n
m
vr, v of (H, V ) as
mv m
f H,V (r, v) = fX vr f Y (v). (5.4.51)
n n
0 r
5.4 Distributions of Statistics 385
∞ ∞ mv m
Now, the marginal pdf f H (r ) = −∞ f H,V (r, v)dv = −∞ n f X n vr f Y
m m −1 m 1 n2 n2 −1
∞ 1 2 m
n vr
2
vr v
exp − v2 u(v)u m
2 2
(v)dv = mv
n −∞
m exp − n 2 n
n vr dv =
Γ 2 Γ 2
m n
1 2 1 2
m m −1 ∞ m+n
2 −1 exp − v2 m
2
m 2
n Γ m Γ n n r 2 0 v n r + 1 dv u(r ) of H can be obtained
2 2
as10
m 1 m m2 −1 ∞ 1 m+n
2
v 2 −1
m+n
f H (r ) = m n r
n Γ 2 Γ 2 n 0 2
v m
× exp − r + 1 dv u(r )
m+n 2 n m
Γ m m 2 −1 m − m+n
= m 2 n
2
r 1+ r u(r ) (5.4.52)
Γ 2 Γ 2 n n n
1 m2 m −1
x 2 −1 e− 2 u(x)
m x
by noting that f X (x) = Γ 2 and f Y (y) =
1 n2 n −1 − 2y
2
2 −1
n
2
Γ 2 y e u(y). ♠
Example 5.4.7 Show that ∼ F(n, m) when X ∼ F(m, n).
1
X
− m+n m2 −1
Solution If we obtain the pdf f Y (y) = y12 (Γn )m (Γ 2n ) 1 + mn 1y
m
Γ m+n 2 m 1
( 2 ) (2) n y
m+n
u 1y = y12 (Γn )m (Γ 2n ) ny u(y) = Γ m( Γ2 )n mn 2 y − 2 −1 (ny) 2
m
Γ m+n m
1− 2 ny 2 Γ m+n m m m+n
m n − m+n
f Y (y) = Γ m( Γ2 )n mn 2 y 2 −1 y + mn
Γ m+n 2
u(y), i.e.,
( 2 ) (2)
Γ m+n n n n2 −1 n − m+n
f Y (y) = m n
2
2
y 1+ y u(y) (5.4.53)
Γ 2 Γ 2 m m m
by noting that u 1
y
= u(y). ♦
Example 5.4.8 When n → ∞, find the limit of the pdf of F(m, n).
Solution Let us first rewrite the pdf f F (x) as
1 1 Γ m+n mx m2 mx − m+n
m n
2
f F (x) = 2
1+ u(x). (5.4.54)
Γ 2 x Γ 2 n n
; <= >
A
n m2 mx m2 mx m2
Using (1.4.77), we have lim A = lim 2 n
= 2
and
n→∞ n→∞
m − (m+n)
2
m − n2 m − m2
lim 1 + x = lim 1 + x 1+ x
n→∞ n n→∞
m n n
= exp − x . (5.4.55)
2
Thus, letting a = m
2
, we get
a
lim f F (x) = (ax)a−1 exp(−ax)u(x). (5.4.56)
n→∞ Γ (a)
In other words, when n → ∞, F(m, n) → G m2 , m2 , where G(α, β) denotes the
gamma distribution described by the pdf (2.5.31). ♦
Example 5.4.9 In Example 5.4.8, we obtained the limit of the pdf of F(m, n) when
n → ∞. Now, obtain the limit when m → ∞. Then, based on the result, when
X ∼ F(m, n) and m → ∞, obtain the pdf of X1 .
Solution Rewrite the pdf f F (x) as
n2 m2
1 1 Γ m+n n mx
f F (x) = n m
2
u(x). (5.4.57)
Γ 2 x Γ 2 n + mx n + mx
; <= >; <= >
B C
b+1
1 b b
lim f F (x) = exp − u(x) (5.4.58)
m→∞ bΓ (b) x x
n
n n2 n n2
noting that lim B = lim ( 2 Γ) m ( 2 ) × mx
m 2
Γ m
= 2x from (1.4.77) and that
m→∞ m→∞ (2)
n −2
m
lim C = lim 1 + mx = exp − 2x n
. Figure 5.7 shows the pdf of F(m, 10)
m→∞ m→∞
n) for m → ∞.
for some values of m, and Fig. 5.8 shows three pdf’s11 of F(m,
Next, for m → ∞, the pdf lim f X1 (y) = lim y 2 f F 1y = bΓ1(b) y12 (by)b+1
1
m→∞ m→∞
exp(−by)u 1y of X1 can be obtained as
b
lim f X1 (y) = (by)b−1 exp(−by)u(y). (5.4.59)
m→∞ Γ (b)
fF (x)
0.8
m → ∞, n = 10
0.6 m = 10, n = 10
m = 20, n = 10
0.4 m = 100, n = 10
0.2
x
0 1 2 3 4 5
2.5
2
n = 0.5
1.5 n = 10
n = 100
1
0.5
0 1 2 3 4 x
Fig. 5.8 The limit lim f F(m,n) (x) of the pdf of F(m, n)
m→∞
n
In other words, 1
X
∼G 2
, n2 when m → ∞. ♦
Theorem
5.4.13
(Rohatgi and Saleh 2001) If X = (X 1 , X 2 , . . . , X m ) from
N μ X , σ 2X and Y = (Y1 , Y2 , . . . , Yn ) from N μY , σY2 are independent of each
other, then
σY2 W X,m
∼ F(m − 1, n − 1) (5.4.60)
σ 2X WY,m
and
388 5 Normal Random Vectors
?
X m − μ X − Y n − μY m+n−2
∼ t (m + n − 2), (5.4.61)
(m−1)W X,m (n−1)WY,m σ 2X σY2
σ 2 + σ 2 m
+ n
X Y
where X m and W X,m are the sample mean and sample variance of X, respectively,
and Y n and WY,n are the sample mean and sample variance of Y , respectively.
(m−1) (n−1)
Proof From (5.4.41), we have σ 2X
W X,m ∼ χ2 (m − 1) and σY2
WY,n ∼ χ2 (n −
1).Thus, (5.4.60) follows from Theorem 5.4.12. Next, noting that X m − Y n ∼
σ 2X σY2
N μ X − μY , m + n , σ2 W X,m + (n−1)
(m−1)
σY2
WY,n ∼ χ2 (m + n − 2), and these two
X
statistics are independent of each other, we easily get (5.4.61) from Theorem 5.4.10.
♠
nX
H = (5.4.62)
mY
is called the non-central F distribution with the degree of freedom of (m, n) and
non-centrality parameter δ, and is denoted by F(m, n, δ).
Here, the pdf (5.4.63) for δ = 0 indicates that F(m, n, 0) is the central F distribution
F(m, n). In Exercise 5.36, we obtain the mean
n(m + δ)
E{H } = (5.4.64)
m(n − 2)
Appendices
The general formula (5.3.51) for the joint moments of normal random vectors is
proved via mathematical induction here.
First note that
⎧ ⎫
n ⎪
⎪ ⎪
∂ * a ⎨
a −1 a −1
* a⎪
n ⎬
E X k k = ai a j E X i i X j j
Xkk (5.A.1)
∂ ρ̃i j ⎪
⎪ ⎪
⎪
k=1 ⎩ k=1 ⎭
k=i, j
for i = j and
⎧ ⎫
⎪
⎪ ⎪
⎪
∂ *n
1 *⎨
n ⎬
E X kak = ai (ai − 1) E X iai −2 X kak (5.A.2)
∂ ρ̃ii 2 ⎪
⎪ ⎪
⎪
k=1 ⎩ k=1 ⎭
k=i
a− j b− j
. ..
min(a,b)
j p q a− j−2 p b− j−2q
E X 1a X 2b = da,b, j, p,q ρ̃12 ρ̃11 ρ̃22 m 1 m2 . (5.A.3)
j=0 p=0 q=0
Then, when ρ̃12 = 0, we have E X 1a X 2b = E X 1a E X 2b , i.e.,
.
a
.
b
p q a−2 p b−2q
da,b,0, p,q ρ̃11 ρ̃22 m 1 m2
p=0 q=0
.
a
p a−2 p
.
b
q b−2q
= da, p ρ̃11 m1 db,q ρ̃22 m 2 (5.A.4)
p=0 q=0
p a−2 p
because the coefficient of ρ̃11 m 1 is da, p = 2 p p!(a−2
a!
p)!
when E X 1a is
expanded as we can see from (5.3.17). Thus, we get
a!b!
da,b,0, p,q = . (5.A.5)
2 p+q p!q!(a − 2 p)!(b − 2q)!
390 5 Normal Random Vectors
∂
E X 1a X 2b = abE X 1a−1 X 2b−1 . (5.A.6)
∂ ρ̃12
∂
+ a−
min(a,b) +j b−
+j j−1 p q
∂ ρ̃12
E X 1a X 2b = j da,b, j, p,q ρ̃12 ρ̃11 ρ̃22
j=1 p=0 q=0
+ a−1−
min(a,b)−1 + j b−1−
+j a−1− j−2 p b−1− j−2q
= ( j + 1)da,b, j+1, p,q m 1 m2 (5.A.7)
j=0 p=0 q=0
and
a−1− j b−1− j
.
min(a−1,b−1) . . j p q
E X 1a−1 X 2b−1 = da−1,b−1, j, p,q ρ̃12 ρ̃11 ρ̃22
j=0 p=0 q=0
a−1− j−2 p b−1− j−2q
×m 1 m2 , (5.A.8)
ab
da,b, j+1, p,q = da−1,b−1, j, p,q (5.A.9)
j +1
a!b!
da,b, j+1, p,q = ,
2 p+q p!q!( j + 1)!(a − j − 1 − 2 p)!(b − j − 1 − 2q)!
(5.A.10)
+
n
from (5.3.51) with n = m − 1 and (5.3.17), where Ml = lii , a1 = {ai }i=1
m−1
, a2 =
i=1
m−1 m−1 1
m 1
m
a1 ∪ {am }, l 1 = li j j=i i=1
, l2 = l1 ∪ {lim }i=1
m
, ζm (l) = li j !, and ηk,m =
i=1 j=i
1
m
L ak , j !. Here, the symbol → denotes a substitution: for example, α → β means
j=1
the substitution of α with β.
Next, employing (5.A.1) with (i, j) = (1, m), (2, m), . . . , (m − 1, m), we will
get
2 3
2 3 ai !am ! da2 ,l 2 lim →0,ai →ai −lim −1,am →am −lim −1
da2 ,l 2 lim →lim +1 = (5.A.13)
(ai − lim − 1)! (am − lim − 1)! (lim + 1)!
2 3 a1 !a2 !am !
da2 ,l 2 l3m →0 =
l1m !l2m ! (a1 − l1m )! (a2 − l2m )! (am − l1m − l2m )!
2 3
× da2 ,l 2 lkm →0 for k=1,2,3, a1 →a1 −l1m , a2 →a2 −l2m , am →am −l1m −l2m , (5.A.17)
392 5 Normal Random Vectors
1
3
ak ! am !
k=1
da2 ,l 2 =
1
3 1
3 +
3
lkm ! (ak − lkm )! am − lkm !
k=1 k=1 k=1
2 3
× da2 ,l 2 +
3 . (5.A.18)
lkm →0, ak →ak −lkm for k=1,2,3; am →am − lkm
k=1
2 3
If we repeat the steps above until we reach i = m − 1, using da2 ,l 2 lm−1,m →0
obtained by letting lm−1,m = 0 in (5.A.18) with i = m − 2 and recollecting (5.A.14)
with i = m − 1, we will eventually get
m−1
1
ak ! am !
k=1
da2 ,l 2 = m−1 m−1
1 1 +
m−1
lkm ! (ak − lkm )! am − lkm !
k=1 k=1 k=1
2 3
× da2 ,l 2 +
m−1 . (5.A.19)
lkm →0, ak →ak −lkm for k=1,2,...,m−1; am →am − lkm
k=1
+
m−1 +
m
Finally, noting that am − lkm − 2lmm = am − lmm − lkm = L a2 ,m and
k=1 k=1
2 3 +
m−1 +
m
that L a1 , j ak →ak −lkm for k=1,2,...,m−1
= a j − l jm − l j j − lk j = a j − l j j − lk j = L a2 , j
k=1 k=1
for j = 1, 2, . . . , m − 1, if we combine (5.A.12) and (5.A.19), we can get
m−1
1
ak ! am !
k=1
da2 ,l 2 = m−1 m−1
1 1 +
m−1
lkm ! (ak − lkm )! am − lkm !
k=1 k=1 k=1
m−1
1 +
m−1
(ak − lkm )! am − lkm !
k=1 k=1
×
+
m−1
2 ζm−1 (l) lmm ! am −
Ml 2
lkm − 2lmm !D2,2
k=1
1
m
ak !
k=1
= , (5.A.20)
2 Ml 2 ζm (l) η2,m (l)
Appendices 393
1
m−1
which implies that (5.3.51) holds true also when n = m, where D2,2 =
2 3 j=1
.
n .
n
Q(x) = ai j xi x j (5.A.21)
j=1 i=1
of x = (x1 , x2 , . . . , xn ), consider
∞ ∞ ∞
Jn = ··· exp{−Q(x)} d x, (5.A.22)
0 0 0
for a11 > 0. When n = 2, assume Q(x) = a11 x12 + a22 x22 + 2a12 x1 x2 , where Δ2 =
a11 a22 − a12
2
> 0. We then get
1 π a12
J2 = √ − tan−1 √ . (5.A.24)
Δ2 2 Δ2
In addition, when n = 3 assume Q(x) = a11 x12 + a22 x22 + a33 x32 + 2a12 x1 x2 +
2a23 x2 x3 + 2a31 x3 x1 , where Δ3 = a11 a22 a33 − a11 a23
2
− a22 a31
2
− a33 a12
2
+
2a12 a23 a31 > 0 and {aii > 0}i=1 . Then, we will get
3
√ / 0
π . −1 ai j aki − aii a jk
c
π
J3 = √ + tan √ (5.A.25)
4 Δ3 2 aii Δ3
+c
after some manipulations, where denotes the cyclic sum defined in (5.3.44).
Now, recollect the standard normal pdf
2
1 x
φ(x) = √ exp − (5.A.26)
2π 2
394 5 Normal Random Vectors
∞
defined in (3.5.2) and (3.5.3), respectively. Based on (5.A.23) or on −∞ exp
∞ ∞
−αx 2 d x = απ shown in (3.3.28), we get −∞ φm (x)d x = 1 m2 −∞ exp
2
(2π)
− mx2 d x, i.e.,
∞
φm (x) d x = (2π)− m− 2 .
m−1 1
2 (5.A.28)
−∞
For n = 0, 1, . . ., consider
∞
In (a) = 2π Φ n (ax) φ2 (x) d x
−∞
∞
= Φ n (ax) exp −x 2 d x. (5.A.29)
−∞
∞
1 2m+1
Φ(ax) − exp −x 2 d x = 0, (5.A.32)
−∞ 2
+
2m+1 i
which can subsequently be expressed as − 21 2m+1 Ci I2m+1−i (a) = 0 from
i=0
1 2m+1 +
2m+1 i
(5.A.29) and then Φ(ax) − 2
= − 21 2m+1 Ci Φ
2m+1−i
(ax). This result
i=0
in turn can be rewritten as
Appendices 395
.
2m+1
I2m+1 (a) = 2−i (−1)i+1 2m+1 Ci I2m+1−i (a) (5.A.33)
i=1
for m = 0, 1, . . . after some steps. Thus, when m = 0, from (5.A.30) and (5.A.33),
we get I1 (a) = 21 I0 (a), i.e.,
√
π
I1 (a) = . (5.A.34)
2
3 1
I3 (a) = I2 (a) − I0 (a). (5.A.35)
2 4
d
Next, recollecting that da Φ(ax) = xφ(ax) and da d
Φ 2 (ax) = 2xΦ(ax)φ(ax),
if we differentiate I2 (a) with respect to a using Leibnitz’s ∞rule (3.2.18), inte-
grate by parts, and then use (3.3.29), we get da d
I2 (a) = 2π −∞ 2xΦ(ax)φ(ax)φ2
∞ ∞
2 Φ(ax) 2+a 2 2
(x) = √2 −∞ Φ(ax)x exp − 2+a 2 x 2 dx = 2
π − 2+a 2 exp − 2 x +
2π 2
x=−∞
∞ 2
2+a x ∞
π 2+a 2 −∞ φ(ax) exp − d x = π 2+a exp − 1 + a 2 x 2 d x , i.e.,
2 a a
2 ( 2 ) −∞
d a π
I2 (a) = . (5.A.36)
da π 2 + a2 1 + a2
√
Consequently, noting (5.A.31) and d
da
tan−1 1 + a2 = a√
(2+a 2 ) 1+a 2
from
−1
d
dx
tan x= 1
1+x 2
, we finally obtain
1
I2 (a) = √ tan−1 1 + a 2 , (5.A.37)
π
and then,
√
3 −1 π
I3 (a) = √ tan 1+a −
2 (5.A.38)
2 π 4
from (5.A.35) and (5.A.37). The results {Jk }3k=1 and {Ik (a)}3k=0 we have derived so
far, together with φ (x) = −xφ(x), are quite useful in obtaining the moments of
order statistics of standard normal distribution for small values of n.
396 5 Normal Random Vectors
fGG (x)
2.5
2 σG = 1
k = 0.5
1.5 k=1
k=2
1
k = 10
0.5 k=∞
0 x
−3 −2 −1 0 1 2 3
As it is also clear in Fig. 5.9, the pdf of the generalized normal distribution is
a unimodal even function, defined by two parameters. The two parameters are the
variance σG2 and the rate k of decay of the pdf.
The generalized normal pdf is usefully employed in representing many pdf’s by
adopting appropriate values of k. For example, when k = 2, the generalized normal
pdf is a normal pdf. When k < 2, the generalized normal pdf is an impulsive pdf:
specifically, when k = 1, the generalized normal pdf is the double exponential pdf
0 / √
1 2|x|
f D (x) = √ exp − . (5.A.40)
2σG σG
√
lim A G (k) = 3σG (5.A.42)
k→∞
and lim k
= → ∞, the limit of the exponential function
√1 . Next, for k
k→∞ 2 A G (k)Γ ( k )
1
2 3σG
√
in (5.A.39) is 1 when |x| ≤ A G (k), or equivalently when |x| ≤ 3σG , and 0 when
|x| > A G (k). Therefore, for k → ∞, we have
1 √
f GG (x) → √ u 3σG − |x| . (5.A.43)
2 3σG
In other words, for k → ∞, the limit of the generalized normal pdf is a uniform pdf
as shown in Fig. 5.9.
B̃c (k, v)
f GC (x) = (5.A.44)
v+ k1
D̃c (x)
is called the generalized Cauchy distribution and is denoted by G C (k, v). Here, k > 0,
v > 0, B̃c (k, v) = 1 ( k ) 1 , and D̃c (x) = 1 + v1 AG|x|(k) .
kΓ v+ 1 k
2v k A G (k)Γ (v)Γ ( k )
Figure 5.10 shows the generalized Cauchy pdf. When the parameter v is finite,
the tail of the generalized Cauchy pdf shows not an exponential behavior, but an
algebraic behavior. Specifically, when |x| is large, the tail of the generalized Cauchy
pdf f GC (x) decreases in proportion to |x|−(kv+1) .
When k = 2 and 2v is an integer, the generalized Cauchy pdf is a t pdf, and when
k = 2 and v = 21 , the generalized Cauchy pdf is a Cauchy pdf
σG
f C (x) = . (5.A.45)
π x2+ σG2
v+ k1
When the parameters σG2 and k are fixed, we have lim D̃c (x) = lim
v→∞ v→∞
1
k v+ k
1 + v1 AG|x|(k) , i.e.,
Appendices 399
fGC (x)
2.5 k = 0.5
v = 10
2 σG = 1
k=1
1.5 k=2
1 k = 10
0.5 k=∞
0 x
−3 −2 −1 0 1 2 3
k )
v+ 1 |x|
lim D̃c k (x) = exp . (5.A.46)
v→∞ A G (k)
lim ( k )
Γ v+ 1
In addition, lim B̃c (k, v) = k
2 A G (k)Γ ( k1 ) v→∞ v k1 Γ (v)
= k
2 A G (k)Γ ( k1 )
because
v→∞
Γ (v+ k1 )
lim 1 = 1 from (1.4.77). Thus, for v → ∞, the generalized Cauchy
v→∞ v k Γ (v)
pdf converges to the generalized normal pdf. For example, when k = 2 and v → ∞,
the generalized Cauchy pdf is a normal pdf.
Next, using lim pΓ ( p) = lim Γ ( p + 1) = 1 shown in (1.4.76)
√
p→0 p→0
and lim A G (k) = 3σG shown in (5.A.42), we get lim B̃c (k, v) =
k→∞ k→∞
Γ (v)
lim = when v is fixed. In addition, lim D̃c (x) = 1
√1
√k ( k )
k→∞ 2 A G (k)Γ (v)
1
Γ 1 2 3σG
√ k→∞
when |x| < 3σG and lim D̃c (x) = ∞ when |x| > 3σG . In short, when v is
k→∞
fixed and k → ∞, we have
1 √
f GC (x) → √ u 3σG − |x| , (5.A.47)
2 3σG
i.e., the limit of the generalized Cauchy pdf is a uniform pdf as shown also in
Fig. 5.10.
After some steps, we can obtain the r -th moment
1
Γ v − rk Γ r +1 2 −1
r
r Γ
= v σGr r 3
r k k
E X k (5.A.48)
Γ (v)Γ 2 k
400 5 Normal Random Vectors
The class of stable distributions is also a useful class for modeling impulsive envi-
ronments for a variety of scenarios. Unlike the generalized Gaussian and generalized
Cauchy distributions, the stable distributions are defined by their cf’s.
In Definition 5.A.3, the numbers m, α, β, and γ are called the location parameter,
characteristic exponent, symmetry parameter, and dispersion parameter, respectively.
The location parameter m represents the mean when 1 < α ≤ 2 and the median when
0 < α ≤ 1. The characteristic exponent α represents the weight or length of the tail of
the pdf, with a smaller value denoting a longer tail or a higher degree of impulsiveness.
The symmetry parameter β determines the symmetry of the pdf with β = 0 resulting
in a symmetric pdf. The dispersion parameter γ plays a role similar to the variance
of a normal distribution. For instance, the stable distribution is a normal distribution
and the variance is 2γ when α = 2. When α = 1 and β = 0, the stable distribution
is a Cauchy distribution.
f (x)
α = 0.6
α = 1.0
α = 1.4
α = 2.0
m x
It is known that the pdf (5.A.53) can be expressed more explicitly in a closed form
when α = 1 and 2. Figure 5.11 shows pdf’s of the SαS distributions.
Let us show that the two infinite series in (5.A.53) become the Cauchy pdf
γ
f (x) = (5.A.54)
π x2 + γ2
when α = 1, and that the second infinite series of (5.A.53) is the normal pdf
1 x2
f (x) = √ exp − (5.A.55)
2 πγ 4γ
∞ −k−1
1 . (−1)k−1 kπ |x| γ
Γ (k + 1) sin = 2 , (5.A.56)
πγ k=1 k! 2 γ π x + γ2
402 5 Normal Random Vectors
which can also be obtained from the second infinite series of (5.A.53) as
2k 2k
1 + (−1)k 1 +
∞ ∞
Γ (2k + 1) γx = πγ (−1)k γx = π x 2γ+γ 2 . Next, noting that
πγ
k=0
(2k)!
k=0
( )
2k+1 (2k)! √ +
∞
(−x) k
Γ 2
= 22k k! π shown in (1.4.84) and that k!
= e−x , the second infinite
k=0
+∞
(−1)k
2k+1 x 2k
series of (5.A.53) for α = 2 can be rewritten as 2π1√γ (2k)!
Γ 2
√
γ
=
k=0
+∞ k +
∞ k
(−1)k x2 x2
√1
2 πγ 22k k! γ
= 2√1πγ 1
k!
− 4γ , i.e.,
k=0 k=0
∞
1 . (−1)k 2k + 1 x 2k 1 x2
√ Γ √ = √ exp − . (5.A.57)
2π γ (2k)! 2 γ 2 πγ 4γ
k=0
3
When A ∼ U − π2 , π2 and an exponential random variable B with mean 1 are
independent of each other, it is known that
1−α
sin(α A) cos{(1 − α)A} α
X = 1 (5.A.58)
(cos A) α B
is a standard SαS random variable. This result is useful when generating random
numbers obeying the SαS distribution.
Definition 5.A.5 (bi-variate isotropic SαS distribution) When the joint pdf of a
random vector (X, Y ) can be expressed as
∞ ∞ α
1
f X,Y (x, y) = 2
exp −γ ω12 + ω22 2
4π −∞ −∞
× exp {− j (xω1 + yω2 )} dω1 dω2 , (5.A.59)
Expressing the pdf (5.A.59) of the bi-variate isotropic SαS distribution in infinite
series, we have
⎧
⎪ +
∞
⎪
⎪
1
2 (−1)k−1 Γ 2 1 + αk
1 αk
⎪ π γ α k=1
⎪ 2
2 k! 2
⎪
⎪
⎪
⎪
⎪ √x 2 +y 2 −αk−2
⎨ × sin kαπ
2 1 ,
γα
f X,Y (x, y) = (5.A.60)
⎪
⎪ for 0 < α ≤ 1,
⎪
⎪ +∞ 2k+2 x 2 +y 2 k
⎪
⎪
⎪
⎪
1 1
Γ − 2 ,
⎪
⎪ 2παγ α k=0 (k!)
2 2 α 4γ α
⎩
for 1 ≤ α ≤ 2.
Appendices 403
Example 5.A.1 Show that (5.A.60) represents a bi-variate Cauchy distribution and
a bi-variate normal distribution for α = 1 and α = 2, respectively. In other words,
show that the two infinite series of (5.A.60) become
γ
f X,Y (x, y) = 23 (5.A.61)
2π x 2 + y 2 + γ 2
when α = 2.
2k+3 1 √
Solution First, note that we have Γ + k Γ 21 + k = (2k+1)!
2
= 2 22k+1 k!
π from
− 23
+∞
(1.4.75) and (1.4.84). Thus, recollecting that (1 + x) = k
− 23 Ck x , i.e.,
k=0
∞
. (−1)k (2k + 1)!
(1 + x)− 2 =
3
xk, (5.A.63)
k=0
22k (k!)2
we get
∞ / 2 0−k−2
1 . 2k (−1)k−1 2 k kπ x + y2
Γ + 1 sin
π2 γ 2 k! 2 2 γ
k=1
⎧ / 0 / 03
1 ⎨ 21 Γ 2 23 γ 23 Γ 2 25 γ
= 2 2 −
π x + y2 ⎩ 1! x 2 + y2 3! x 2 + y2
/ 05 / 07 ⎫
25 Γ 2 27 γ 27 Γ 2 29 γ ⎬
+ − + ···
5! x 2 + y2 7! x 2 + y2 ⎭
.∞ / 02k+1
1 (−1)k 22k+1 2 2k + 3 γ
= 2 2 Γ
π x + y2 (2k + 1)! 2 x 2 + y2
k=0
∞
/ 0 2k+1
1 . (−1)k (2k + 1)! γ
= 2
π x +y 2 22k+1 (k!)2 x 2 + y2
k=0
∞
/ 0k
1 γ . (−1)k (2k + 1)! γ2
= ×
2π x 2 + y 2 x 2 + y2 22k (k!)2 x 2 + y2
k=0
/ 0− 3
γ γ2 2
= 3 1+ 2
x + y2
2π x 2 + y 2 2
γ
= 3 (5.A.64)
2π x 2 + y 2 + γ 2 2
404 5 Normal Random Vectors
when α = 1 from the first infinite series of (5.A.60). The result (5.A.64)
+
∞ 2 2 k +∞ 2 2 k
Γ (2k+2) x +y (2k+1)!(−1)k x +y
can also be obtained as 2πγ (k!)
2 2 − 4γ 2 = 2πγ (k!) 2
2 2 2k γ2
=
k=0 k=0
− 23
x 2 +y 2
1
2πγ 2
1+ γ2
, i.e.,
.∞ 2 k
Γ (2k + 2) x + y2 γ
2 (k!)2
− 2
= 3 (5.A.65)
k=0
2πγ 4γ 2π x 2 + y 2 + γ 2 2
from the second infinite series of (5.A.60) using (5.A.63). Next, when α = 2,
2 2 k
1 + Γ (k+1)
∞
+y
from the second infinite series of (5.A.60), we get 4πγ (k!)2
− x 4γ = 4πγ
1
k=0
+∞ 2 2 k +
∞
x +y (−x)k
1
k!
− 4γ , which is the same as (5.A.62) because k!
= e−x . ♦
k=0 k=0
Exercises
Exercise 5.1 Assume a random vector (X, Y ) with the joint pdf
1 2 2
f X,Y (x, y) = √ exp − x 2 + y 2 cosh xy . (5.E.1)
π 3 3 3
Exercise 5.2 When X 1 ∼ N (0, 1) and X 2 ∼ N (0, 1) are independent of each other,
obtain the conditional joint pdf of X 1 and X 2 given that X 12 + X 22 < a 2 .
Exercise 5.3 Assume that X 1 ∼ N (0, 1) and X 2 ∼ N (0, 1) are independent of each
other.
√
(1) Obtain the joint pdf of U = X 2 + Y 2 and V = tan−1 YX .
(2) Obtain the joint pdf of U = 21 (X + Y ) and V = 21 (X − Y )2 .
Exercise 5.4 Obtain the conditional pdf’s f Y |X (y|x) and f X |Y (x|y) when (X, Y )
∼ N (3, 4, 1, 2, 0.5).
2
σ12 − σ22
0 ≤ ρ2Z W ≤ (5.E.2)
σ12 + σ22
when X 1 ∼ N μ1 , σ12 and X 2 ∼ N μ2 , σ22 are independent of each other.
Exercise 5.6 When the two normal random variables X and Y are independent of
each other, show that X + Y and X − Y are independent of each other.
Exercise 5.7 Let us consider (5.2.1) and (5.2.2) when n = 3 and s = 1. Based on
−1
−1 1 ρ23
(5.1.18) and (5.1.21), show that Ψ 22 − Ψ 21 Ψ 11 Ψ 12 is equal to K 22 = .
ρ23 1
Exercise 5.8 Consider the random variable
Y, when Z = +1,
X = (5.E.3)
−Y, when Z = −1,
where Z is a binary random variable with pmf p Z (1) = p Z (−1) = 0.5 and Y ∼
N (0, 1).
(1) Obtain the conditional cdf FX |Y (x|y).
(2) Obtain the cdf FX (x) of X and determine whether or not X is normal.
(3) Is the random vector (X, Y ) normal?
(4) Obtain the conditional pdf f X |Y (x|y) and the joint pdf f X,Y (x, y).
⎛ 1 1 5.9
Exercise ⎞ For a zero-mean normal random vector X with covariance matrix
1 6 36
⎝ 1 1 1 ⎠, find a linear transformation to decorrelate X.
6 6
1 1
36 6
1
Exercise 5.10 Let X = (X, Y ) denote the coordinate of a point in the two-
√ and C = (R, Θ) be its polar coordinate. Specifically, as shown in
dimensional plane
Fig. 5.12, R = X 2 + Y 2 is the distance from the origin to X, and Θ = ∠X is the
angle between the positive x-axis and the line from the origin to X, where we assume
−π < Θ ≤ π. Express the joint pdf of C in terms of the joint pdf of X. When X is
an i.i.d. random vector with marginal distribution N 0, σ 2 , prove or disprove that
C is an independent random vector.
Exercise 5.11 For the limit pdf lim f X 1 ,X 2 (x, y) shown in (5.1.15), show that
ρ→±1
∞ ∞
−∞ lim f X 1 ,X 2 (x, y)dy = f X 1 (x) and −∞ lim f X 1 ,X 2 (x, y)d x = f X 2 (y).
ρ→±1 ρ→±1
Y X = (X, Y )
Θ
X
based on the moment theorem. Show (5.E.4) based on Taylor series of the cf.
Exercise 5.16 When (Z , W ) ∼ N m 1 , m 2 , σ12 , σ22 , ρ , obtain E Z 2 W 2 .
Exercise 5.17 Denote the joint moment by μi j = E X i Y j for a zero-mean random
vector (X, Y ). Based on the moment theorem, (5.3.22),
(5.3.30), or
(5.3.51), obtain
μ51 , μ42 , and μ33 for a random vector (X, Y ) ∼ N 0, 0, σ12 , σ22 , ρ .
2
ρ|X 1 ||X 2 | = 1 − ρ2 + ρ sin−1 ρ − 1 . (5.E.5)
π−2
Exercises 407
1 π
E {X 1 u (X 1 ) X 2 u (X 2 )} = + sin−1 ρ ρ + 1 − ρ2 , (5.E.6)
2π 2
which12 implies that E {W Y u(W )u(Y )} = σW2πσY ρ cos−1 (−ρ) + 1 − ρ2 when
(W, Y ) ∼ N 0, 0, σW 2
, σY2 , ρ . In addition, when W = Y , we can obtain
2
E W u(W ) = 21 σW 2
with ρ = 1 and σY = σW , which can be proved by a
∞ ∞
direct integration as E W 2 u(W ) = 0 √ x 2 exp − 2σx 2 d x = 21 −∞ √ x 2
2 2 2
2
2πσW W 2πσW
exp − 2σx 2 d x = 21 σW 2
.
W
and
2
E {δ (X 1 ) sgn (X 2 ) sgn (X 3 )} = sin−1 β23,1 (5.E.12)
π3
12 Here, the range of the inverse cosine function cos−1 x is [0, π], and cos sin−1 ρ = 1 − ρ2 .
Note that, letting π2 + sin−1 ρ = θ, we get cos θ = cos π2 + sin−1 ρ = − sin sin−1 ρ = −ρ and,
π
subsequently, 2 + sin−1 ρ ρ = ρ cos−1 (−ρ). Thus, we have θ = cos−1 (−ρ).
13 Here, using E {sgn (X ) sgn (X )} = 2 sin−1 ρ
2 3 π 23 obtained in (5.3.29) and E {δ (X 1 )}
= √1 , we can obtain E {δ (X 1 ) sgn (X 2 ) sgn (X 3 )}|ρ31 =ρ12 =0 = E {δ (X 1 )} E {sgn
2π
(X 2 ) sgn (X 3 )} = π23 sin−1 ρ23 from (5.E.12) when ρ31 = ρ12 = 0. This result is the same
as π23 sin−1 β23,1 .
ρ31 =ρ12 =0
408 5 Normal Random Vectors
and
2 2ρ31 ρ12 − ρ23 ρ212 − ρ23 ρ231
E X 12 sgn (X 2 ) sgn (X 3 ) =
π 1 − ρ223
2
+ sin−1 ρ23 (5.E.16)
π
14 We can easily get E X 12 X 2 = E {|X 1 X 2 X 3 |}| X 3 →X 1 = E {|X 1 X 2 X 3 |}|ρ31 =1 =
π3
8
0 + π2 1 + ρ2 = π2 1 + ρ2 with (5.E.14). Similarly, with (5.E.15), it is easy
to get E {|X 3 |} = E {sgn (X 1 ) sgn (X 2 ) |X 3 |}| X 2 →X 1 = E {sgn (X 1 ) sgn (X 2 ) |X 3 |}|ρ12 =1 =
2
8
sin −1 1−ρ23 + 2ρ sin−1 0 = 2
and E {sgn (X 1 ) X 2 } = E {sgn (X 1 ) sgn
π3 1−ρ223 23 π
(X 2 ) |X 3 |}|ρ23 =1 = π83 0 + 0 + ρ12 sin−1 1 = π2 ρ12 . Next, when |ρ23 | → 1, we
have ρ31 → sgn (ρ23 ) ρ12 because X 3 → X 2 , and thus lim 2ρ31 ρ12 − ρ23 ρ212 −
ρ23 →1
2ρ31 ρ12 −ρ23 ρ212 −ρ23 ρ231 −ρ212 −ρ231
ρ23 ρ231 = 0. Consequently, we get lim = lim =
ρ23 →1 1−ρ223 ρ23 →1 √
−2ρ23
2
2 1−ρ23
2
ρ12 +ρ231 1−ρ223
lim ρ23 = 0 in (5.E.16) using L’Hospital’s theorem.
ρ23 →1
√|K 3 |
15 Based on this result, we have 1−ρ2
dρ12 = sin−1 β12,3 + ρ23 sin−1 β31,2 + ρ31
12
sin−1 β23,1 + h (ρ23 , ρ31 ) for a function h.
Exercises 409
Exercise 5.25 Find thecoefficient of the term ρ̃212 ρ̃22 ρ̃434 m 1 m 4 in the expansion
of the joint moment E X 13 X 24 X 34 X 45 for a quadri-variate normal random vector
(X 1 , X 2 , X 3 , X 4 ).
2
E {|X 1 X 2 |} = 1 − ρ2 + ρ sin−1 ρ (5.E.17)
π
for (X 1 , X 2 ) ∼ N (0, 0, 1, 1, ρ). The result (5.E.17) is obtained with other meth-
ods in Exercises 5.19 and 5.20. When (X1 , X 2 ) ∼ N (0, 0, 1, 1, X1 =
ρ) and
2 −1
X 2 , (5.E.17) can be written as E X 1 = π ρ sin ρ + 1 − ρ σ1 σ2
2 2
ρ=1,σ2 =σ1
= σ12 , implying E X 2 = σ 2 when X ∼ N 0, σ 2 .
Exercise 5.27 Let us show some results related to the general formula (5.3.51) for
the joint moments of normal random vectors.
(1) Confirm the coefficient
a!
da, j = (5.E.18)
2 j j! (a − 2 j)!
in (5.3.17).
(2) Recollecting (5.3.46)–(5.3.48), show that
⎛ ⎞⎛ ⎞
. *3 *
3 *
3
da,l ⎝ ρ̃iijj ⎠ ⎝ m j a, j ⎠
l L
E X 1a1 X 2a2 X 3a3 = (5.E.19)
l∈Sa i=1 j=i j=1
a1 !a2 !a3 !
da,l = +
3 / 0/ 0 (5.E.20)
ljj 1
3 13 1
3
2 j=1 li j ! L a, j !
i=1 j=i j=1
when n = 3.
(3) Show that (5.3.51) satisfies (5.A.1) and (5.A.2).
for the generalized normal random variable X with the pdf (5.A.39).
410 5 Normal Random Vectors
Exercise 5.29 When vk > r and r is even, show the r -th moment
1
Γ v − rk Γ r +1 2 −1
r
r Γ
= v σGr r 3
r k k
E X k (5.E.22)
Γ (v)Γ 2 k
for the generalized Cauchy random variable X with the pdf (5.A.44).
γ 2
Exercise 5.30 Obtain the pdf f X (x) of X from the joint pdf f X,Y (x, y) = 2π x
− 3
+y 2 + γ 2 2 shown in (5.A.61). Confirm that the pdf is the same as the pdf f (r ) =
α
2 −1
π
r + α2 obtained by letting β = 0 in (2.5.28).
Exercise 5.31 Show that the mgf of the sample mean is
n
t
M X n (t) = M (5.E.23)
n
of X ∼ t (k).
Exercise 5.34 Obtain the mean and variance of Z ∼ t (n, δ).
Exercise 5.35 For H ∼ F(m, n), show that
n k Γ m + k Γ n − k
E H k
= 2
2 (5.E.26)
m Γ m2 Γ n2
9n:
for k = 1, 2, . . . , 2
− 1.
Exercise 5.36 Obtain the mean and variance of H ∼ F(m, n, δ).
Exercise 5.37 For i.i.d. random variables X 1 , X 2 , X 3 , and X 4 with marginal distri-
bution N (0, 1), show that the pdf of Y = X 1 X 2 + X 3 X 4 is f Y (y) = 21 e−|y| .
Exercises 411
+
n
for the sample mean X n = 1
n
X i with X 0 = 0.
i=1
Exercise 5.40 Let us denote the k-th central moment of X i by E (X i − m)k = μk
for k = 0, 1, . . .. Obtain the fourth central moment μ4 X n of the sample mean X n
for a sample X = (X 1 , X 2 , . . . , X n ).
Exercise 5.41 Prove Theorem 5.1.4 by taking the steps described below.
(1) Show that the pdf f 3 (x, y, z) shown in (5.1.19) can be written as
2
1 (x + t12 y)2 1 − t12
2
y
f 3 (x, y, z) = exp − exp −
8π 3 |K 3 | 2 1 − ρ212 2 1 − ρ12 2
1 − ρ212
× exp − (z + b12 ) ,
2
(5.E.28)
2 |K 3 |
where t12 = |K q12
3|
and b12 = c231−ρ
y+c31 x
2 with q12 = c12 1 − ρ212 − c23 c31 and ci j =
ρ jk ρki − ρi j .
12
1 − ρ212 1
lim = . (5.E.29)
ρ12 →±1 |K 3 | 1 − ρ223
α
Subsequently, using lim π
exp −αx 2 = δ(x), show that
α→∞
1 (x + t12 y)2 δ(x − ξ12 y)
lim exp − = , (5.E.30)
ρ12 →±1 8π 3 |K 3 | 2 1 − ρ212 2π 1 − ρ223
where ξi j = sgn ρi j .
2
1−t12
(3) Show that lim 2 = 1, which instantly yields
ρ12 →±1 1−ρ12
412 5 Normal Random Vectors
2
1 − t12
2
y 1 2
lim exp − = exp − y . (5.E.31)
ρ12 →±1 2 1 − ρ212 2
where μ1 (x, y) = 21 ξ12 (ρ23 x + ρ31 y). Combining (5.E.30), (5.E.31), and
(5.E.32) into (5.E.28), and noting that ρ23 = ξ12 ρ31 when ρ12 → ±1 and that
y can be replaced with ξ12 x due to the function δ(x − ξ12 y), we get (5.1.36).
(5) Obtain (5.1.37) from (5.1.36).
Exercise 5.42 Assume (X, Y ) has the standard bi-variate normal pdf φ2 .
X 2 −2ρX Y +Y 2
(1) Obtain the pdf and cdf of V = g(X, Y ) = 2(1−ρ2 )
.
(2) Note that φ2 (x, y) = c is equivalent to x −2ρx y + y = c1 , an ellipse, for
2 2
x
Exercise 5.43 Consider (X 1 , X 2 ) ∼ N (0, 0, 1, 1, ρ) and g(x) = 2β 0
α
φ(z)dz,
i.e.,
x
g(x) = β 2Φ −1 , (5.E.33)
α
where α > 0, β > 0, Φ is the standard normal cdf, and φ is the standard normal
pdf. Obtain the correlation RY = E {Y1 Y2 } and correlation coefficient ρY between
Y1 = g (X 1 ) and Y2 = g (X 2 ). Obtain the values of ρY when α2 = 1 and α2 → ∞.
Note that g is a smoothly increasing function from −β to β. When α = 1, we have
β {2Φ (X i ) − 1} ∼ U (−β, β) because Φ(X ) ∼ U (0, 1) when X ∼ Φ from (3.2.50).
Exercise 5.44 In Figs. 5.1, 5.2 and 5.3, show that the angle θ between the major
axis of the ellipse and the positive x-axis can be expressed as (5.1.9).
References
M. Abramowitz, I.A. Stegun (eds.), Handbook of Mathematical Functions (Dover, New York, 1972)
J. Bae, H. Kwon, S.R. Park, J. Lee, I. Song, Explicit correlation coefficients among random variables,
ranks, and magnitude ranks. IEEE Trans. Inform. Theory 52(5), 2233–2240 (2006)
References 413
W. Bär, F. Dittrich, Useful formula for moment computation of normal random variables with
nonzero means. IEEE Trans. Automat. Control 16(3), 263–265 (1971)
R.F. Baum, The correlation function of smoothly limited Gaussian noise. IRE Trans. Inform. Theory
3(3), 193–197 (1957)
J.L. Brown Jr., On a cross-correlation property of stationary processes. IRE Trans. Inform. Theory
3(1), 28–31 (1957)
W.B. Davenport Jr., Probability and Random Processes (McGraw-Hill, New York, 1970)
W.A. Gardner, Introduction to Random Processes with Applications to Signals and Systems, 2nd
edn. (McGraw-Hill, New York, 1990)
I.S. Gradshteyn, I.M. Ryzhik, Table of Integrals, Series, and Products (Academic, New York, 1980)
J. Hajek, Nonparametric Statistics (Holden-Day, San Francisco, 1969)
J.B.S. Haldane, Moments of the distributions of powers and products of normal variates. Biometrika
32(3/4), 226–242 (1942)
G.G. Hamedani, Nonnormality of linear combinations of normal random variables. Am. Stat. 38(4),
295–296 (1984)
B. Holmquist, Moments and cumulants of the multivariate normal distribution. Stochastic Anal.
Appl. 6(3), 273–278 (1988)
R.A. Horn, C.R. Johnson, Matrix Analysis (Cambridge University Press, Cambridge, 1985)
L. Isserlis, On a formula for the product-moment coefficient of any order of a normal frequency
distribution in any number of variables. Biometrika 12(1/2), 134–139 (1918)
N.L. Johnson, S. Kotz, Distributions in Statistics: Continuous Multivariate Distributions (Wiley,
New York, 1972)
A.R. Kamat, Incomplete moments of the trivariate normal distribution. Indian J. Stat. 20(3/4),
321–322 (1958)
R. Kan, From moments of sum to moments of product. J. Multivariate Anal. 99(3), 542–554 (2008)
S. Kotz, N. Balakrishnan, N.L. Johnson, Continuous Multivariate Distributions, 2nd edn. (Wiley,
New York, 2000)
E.L. Melnick, A. Tenenbein, Misspecification of the normal distribution. Am. Stat. 36(4), 372–373
(1982)
D. Middleton, An Introduction to Statistical Communication Theory (McGraw-Hill, New York,
1960)
G.A. Mihram, A cautionary note regarding invocation of the central limit theorem. Am. Stat. 23(5),
38 (1969)
T.M. Mills, Problems in Probability (World Scientific, Singapore, 2001)
S. Nabeya, Absolute moments in 3-dimensional normal distribution. Ann. Inst. Stat. Math. 4(1),
15–30 (1952)
C.L. Nikias, M. Shao, Signal Processing with Alpha-Stable Distributions and Applications (Wiley,
New York, 1995)
J.K. Patel, C.H. Kapadia, D.B. Owen, Handbook of Statistical Distributions (Marcel Dekker, New
York, 1976)
J.K. Patel, C.B. Read, Handbook of the Normal Distribution, 2nd edn. (Marcel Dekker, New York,
1996)
D.A. Pierce, R.L. Dykstra, Independence and the normal distribution. Am. Stat. 23(4), 39 (1969)
R. Price, A useful theorem for nonlinear devices having Gaussian inputs. IRE Trans. Inform. Theory
4(2), 69–72 (1958)
V.K. Rohatgi, A.K. Md. E. Saleh, An Introduction to Probability and Statistics, 2nd edn. (Wiley,
New York, 2001)
J.P. Romano, A.F. Siegel, Counterexamples in Probability and Statistics (Chapman and Hall, New
York, 1986)
I. Song and S. Lee, Explicit formulae for product moments of multivariate Gaussian random vari-
ables. Stat. Prob. Lett. 100, 27–34 (2015)
414 5 Normal Random Vectors
I. Song, S. Lee, Y.H. Kim, S.R. Park, Explicit formulae and implication of the expected values of
some nonlinear statistics of tri-variate Gaussian variables. J. Korean Stat. Soc. 49(1), 117–138
(2020)
J.M. Stoyanov, Counterexamples in Probability, 3rd edn. (Dover, New York, 2013)
K. Triantafyllopoulos, On the central moments of the multidimensional Gaussian distribution. Math.
Sci. 28(2), 125–128 (2003)
G.A. Tsihrintzis, C.L. Nikias, Incoherent receiver in alpha-stable impulsive noise. IEEE Trans.
Signal Process. 43(9), 2225–2229 (1995)
G.L. Wies, E.B. Hall, Counterexamples in Probability and Real Analysis (Oxford University, New
York, 1993)
C.S. Withers, The moments of the multivariate normal. Bull. Austral. Math. Soc. 32(1), 103–107
(1985)
Chapter 6
Convergence of Random Variables
In this chapter, we discuss sequences of random variables and their convergence. The
central limit theorem, one of the most important and widely-used results in many
areas of the applications of random variables, will also be described.
Definition 6.1.1 (sure convergence; almost sure convergence) For every point ω of
the sample space on which the random variable X n is defined, if
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 415
I. Song et al., Probability and Random Variables: Theory and Applications,
https://doi.org/10.1007/978-3-030-97679-8_6
416 6 Convergence of Random Variables
Example 6.1.2 (Leon-Garcia 2008) For a randomly chosen point ω ∈ [0, 1], assume
that P(ω ∈ (a, b)) = b − a for 0 ≤ a ≤ b ≤ 1.Now consider
the five sequences of
random variables An (ω) = ωn , Bn (ω) = ω 1 − n1 , Cn (ω) = ωen , Dn (ω) =
cos 2π nω, and Hn (ω) = exp{−n(nω − 1)}. The sequence {An (ω)}∞ n=1 converges
always to 0 for any value of ω ∈ [0, 1], and thus it is surely convergent to 0. The
sequence {Bn (ω)}∞n=1 converges to ω for any value of ω ∈ [0, 1], and thus it is surely
convergent to ω with the limit distribution U [0, 1]. The sequence {Cn (ω)}∞ n=1 con-
verges to 0 when ω = 0 and diverges when ω ∈ (0, 1]: in other words, it is not
convergent. The sequence {Dn (ω)}∞ n=1 converges to 1 when ω ∈ {0, 1} and oscil-
lates between −1 and 1 when ω ∈ (0, 1): in other words, it is not convergent.
When n → ∞, Hn (0) = en → ∞ for ω = 0 and Hn (ω) → 0 for ω ∈ (0, 1]: in other
words, {Hn (ω)}∞n=1 is not surely convergent. However, because P(ω ∈ (0, 1]) = 1,
{Hn (ω)}∞n=1 converges almost surely to 0. ♦
6.1 Types of Convergence 417
∞
P (|X n | > ε) < ∞ (6.1.5)
n=1
a.s.
for ε > 0, it is easy to see that X n −→ 0 as n → ∞ from the Borel-Cantelli lemma.
In addition, even if we change the condition (6.1.5) into
∞
P (|X n | > εn ) < ∞ (6.1.6)
n=1
a.s.
for a sequence {εn }∞
n=1 such that εn ↓ 0, we still have X n −→ 0. Now, when ω is a
randomly chosen point in [0, 1], for a sequence {X n }∞
n=1 with
0, 0 ≤ ω ≤ 1 − n1 ,
X n (ω) = (6.1.7)
1, 1 − n1 < ω ≤ 1,
a.s.
we have X n −→ 0 as n → ∞. However, for any εn such that εn ↓ 0, if we con-
sider a sufficiently large n, we have P (|X n | > εn ) = P (X n = 1) = n1 and thus
∞
P (|X n | > εn ) → ∞ for the sequence (6.1.7). In other words, (6.1.6) is a suf-
n=1
ficient condition for the sequence to converge almost surely to 0, but not a necessary
condition. ♦
a.s.
Theorem 6.1.1 (Rohatgi and Saleh 2001; Stoyanov 2013) If X n −→ X for a
sequence {X n }∞
n=1 , then
holds true for every ε > 0, and the converse also holds true.
a.s. a.s.
Proof If X n −→ X , then we have X n − X −→ 0. Thus, let us show that
a.s.
X n −→ 0 (6.1.9)
and
r
then {X n }∞
n=1 is called to converge to X in the r -th mean, and is denoted by X n → X
r
L
or X n −→ X .
6.1 Types of Convergence 419
is written also as
l.i.m. X n = X, (6.1.14)
n→∞
for a sequence {X n }∞
n=1 . Then, the sequence
converges in the mean square to X such
that P(X = 0) = 1 because lim E |X n |2 = lim n1 = 0.
n→∞ n→∞
1 1
E |X n − X m |2 ≤ + . (6.1.17)
n m
Therefore, {X n }∞
n=1 converges in the mean square. ♦
420 6 Convergence of Random Variables
2
Example 6.1.7 We have lim E |Bn − Bm |2 = lim E n1 − m1 ω2 =
n,m→∞ n,m→∞
2
E ω2 lim n1 − m1 = 0 for the sequence {Bn }∞
n=1 in Example 6.1.5.
n,m→∞
Mean square convergence implies that more and more sequences are close to
the limit X as n becomes larger. However, unlike in almost sure convergence, the
sequences close to X do not necessarily always stay close to X .
p
for every ε > 0, and is denoted by X n → X .
Note that (6.1.18) implies that almost every sequence is within a range of 2ε at any
given time but that the sequence is not required to stay in the range. However, (6.1.8)
dictates that a sequence is required to stay within the range 2ε once it is inside the
This can easily be confirmed by interpreting the meanings of {|X n − X | > ε}
range.
and sup |X m − X | > ε .
m≥n
for a sequence {X n }∞
n=1 . Then, because
0, ε ≥ 1,
P (|X n | > ε) = (6.1.20)
1
n
, 0 < ε < 1,
p
we have lim P (|X n | > ε) = 0 and thus X n → 0. ♦
n→∞
6.1 Types of Convergence 421
p
We then have X n → 5 because
⎧
⎨ 0, ε ≥ 2,
P (|X n − 5| > ε) = 1
, 1 ≤ ε < 2, (6.1.22)
⎩ 2n
1
n
, 0 < ε < 1,
for all points at which the cdf F of X is continuous, then the sequence {X n }∞
n=1 is
d
said to converge weakly, in law, or in distribution to X , and is written as X n → X or
l
Xn → X .
x √
n nt 2
Example 6.1.12 For the cdf Fn (x) = √
−∞ σ 2π exp − 2σ 2 dt of X n , we have
⎧
⎨ 0, x < 0,
lim Fn (x) = 1
, x = 0, (6.1.24)
n→∞ ⎩2
1, x > 0.
Thus, {X n }∞
n=1 converges weakly to a random variable X with the cdf
0, x < 0,
F(x) = (6.1.25)
1, x ≥ 0.
422 6 Convergence of Random Variables
Note that although lim Fn (0) = F(0), the convergence in distribution does not
n→∞
require the convergence at discontinuity points of the cdf: in short, the convergence
of {Fn (x)}∞
n=1 at the discontinuity point x = 0 of F(x) is not a prerequisite for the
convergence in distribution. ♦
We now discuss the relations among various types of convergence discussed in pre-
vious sections. First, let
D ⊃ P ⊃ Ms ⊃ Mt (6.1.26)
and
P ⊃A. (6.1.27)
d
X n = X, (6.1.28)
d
we have X n → X . However, because P (|X n − X | > ε) = P |X | > 2ε 0 when
p
n → ∞, we have X n X . ♦
Example 6.1.14 For the sample space S = {ω1 , ω2 , ω3 , ω4 }, assume the event space
2 S and the uniform probability measure. Define {X n }∞
n=1 by
0, ω = ω3 or ω4 ,
X n (ω) = (6.1.29)
1, ω = ω1 or ω2 .
d
1 Here, = means ‘equal in distribution’ as introduced in Example 3.5.18.
6.1 Types of Convergence 423
Also let
0, ω = ω1 orω2 ,
X (ω) = (6.1.30)
1, ω = ω3 orω4 .
d
In other words, X n → X . Meanwhile, because |X n (ω) − X (ω)| = 1 for ω ∈ S and
n ≥ 1, we have P(|X n − X | > ε) 0 for n → ∞. Thus, X n does not converge to
X in probability.
∞
Noting that 1− 1
k
= 0 for any natural number m, we get
k=m
1 1 1
P (Bm (ε)) = 1 − lim 1− 1− ··· 1 −
M→∞ m m+1 M
=1 (6.1.33)
because {X n }∞
n=1 is an independent sequence. Thus, lim P (Bm (ε)) = 0 and, from
m→∞
a.s.
Theorem 6.1.1, X n 0. ♦
Example 6.1.16 (Rohatgi and Saleh 2001; Stoyanov 2013) Based on the inequal-
1
ity |x|s ≤ 1 + |x|r for 0 < s < r , or on the Lyapunov inequality E {|X |s } s ≤
1
E {|X |r } r for 0 < s < r shown in (6.A.21), we can easily show that
Lr Ls
X n −→ X ⇒ X n −→ X (6.1.34)
r +s
n− 2 , x = n,
P (X n = x) = r +s (6.1.35)
1 − n− 2 , x = 0
424 6 Convergence of Random Variables
Lr s−r
for 0 < s < r , then X n −→ 0 because E X ns = n 2 → 0 for n → ∞: however,
r −s Ls
because E X nr = n 2 → ∞, we have X n 0. In short,
Ls Lr
X n −→ 0 X n −→ 0 (6.1.36)
for r > s. ♦
p
for a sequence {X n }∞
n=1 , then X n → 0 because P (|X n | < ε) = P (X n = 0) = 1 −
Lr
1
nr n
→ 1 for ε > 0 when n → ∞. However, we have X n 0 because E X nr =
e
n
→ ∞.
Example 6.1.18 (Rohatgi and Saleh 2001) Note first that a natural number n can be
expressed uniquely as
n = 2k + m (6.1.38)
with integers k ∈ {0, 1, . . .} and m ∈ 0, 1, . . . , 2k − 1 . Define a sequence {X n }∞
n=1
by
2k , 2mk ≤ ω < m+1
,
X n (ω) = 2k (6.1.39)
0, otherwise
Then, lim X n (ω) does not exist for any choice ω ∈ and, therefore, the sequence
n→∞
does not converge almost surely. However, because
p
we have lim P (|X n | > ε) = 0, and thus X n → 0. ♦
n→∞
6.1 Types of Convergence 425
∞ 1 1
where α > 0. Then, E |X n | 2 < ∞ when α > 23 because E |X n | 2 =
n=1
√ √
n −α+ 2 . Letting α = ε and X = α |X n | in the Markov inequality P(X ≥ α) ≤
1
E{X }
introduced in (6.A.15), we have P (|X n | > ε) ≤ ε− 2 E |X | 2 , and thus
1 1
α
∞
P (|X n | > ε) < ∞ when ε > 0. Now, employing Borel-Cantelli lemma as in
n=1
a.s. L2
Example 6.1.3, we have X n −→ 0. Meanwhile, X n 0 when α ≤ 2 because
a.s. L2
E |X n |2 = n α−2
1
. In essence, we have X n −→ 0, yet X n 0 for α ∈ 23 , 2 . ♦
d d
Example 6.1.20 (Stoyanov 2013) When X n → X and Yn → Y , we have X n +
d
Yn → X + Y if the sequences {X n }∞ ∞
n=1 and {Yn }n=1 are independent of each other.
On the other hand, if the two sequences are not independent of each other, we may
d
have different results. For example, assume X n → X ∼ N (0, 1) and let Yn = α X n .
Then, we have
d
Yn → Y ∼ N 0, α 2 , (6.1.43)
and the distribution of X n + Yn = (1 + α)X
n 2converges
to N 0, (1 + α)2 . How-
ever, because X ∼ N (0, 1) and Y ∼ N 0, α , the distribution of X + Y is not
necessarily N 0, (1 + α)2 . In other words, if the sequences {X n }∞ ∞
n=1 and {Yn }n=1
d
are not independent of each other, it is possible that X n + Yn X + Y even when
d d
X n → X and Yn → Y . ♦
In this section, we will consider the sum of random variables and its convergence. We
will then introduce the central limit theorem (Davenport 1970; Doob 1949; Mihram
1969), one of the most useful and special cases of convergence.
426 6 Convergence of Random Variables
The sum of random variables is one of the key ingredients in understanding and
applying the properties of convergence and limits. We have discussed the properties
of the sum of random variables in Chap. 4. Specifically, the sum of two random
variables as well as the cf and distribution of the sum of a number of random variables
are discussed in Examples 4.2.4, 4.2.13, and 4.3.8. We now consider the sum of a
number of random variables more generally.
n
Theorem 6.2.1 The expected value and variance of the sum Sn = X i of the ran-
i=1
dom variables {X i }i=1
n
are
n
E {Sn } = E {X i } (6.2.1)
i=1
and
n
n−1
n
Var {Sn } = Var {X i } + 2 Cov X i , X j , (6.2.2)
i=1 i=1 j=i+1
respectively.
n n
Proof First, it is easy to see that E {Sn } = E Xi = E {X i }. Next, we have
n
i=1 i=1
n 2
Var ai X i = E ai X i − E {X i } , i.e.,
i=1 i=1
n ⎧ ⎫
⎨n
n
⎬
Var ai X i = E ai a j (X i − E {X i }) X j − E X j
⎩ ⎭
i=1 i=1 j=1
n
n−1
n
= ai2 Var {X i } + 2 ai a j Cov X i , X j (6.2.3)
i=1 i=1 j=i+1
n
Var {Sn } = Var {X i } (6.2.4)
i=1
Theorem 6.2.1 says that the expected value of the sum of random variables is the
sum of the expected values of the random variables. In addition, the variance of the
sum of random variables is obtained by adding the sum of the covariances between
two distinct random variables to the sum of the variances of the random variables.
Theorem 6.2.2 dictates that the variance of the sum of uncorrelated random variables
is simply the sum of the variances of the random variables.
Example
6.2.1 (Yates and Goodman 1999) Assume thatn the joint moments are
E X i X j = ρ |i− j| for zero-mean random variables {X i }i=1 . Obtain the expected
value and variance of Yi = X i−2 + X i−1 + X i for i = 3, 4, . . . , n.
because Var {X i } = ρ 0 = 1. ♦
Example 6.2.2 In a meeting of a group of n people, each person attends with a gift.
The name tags of the n people are put in a box, from which each person randomly
picks one name tag: each person gets the gift brought by the person on the name
tag. Let G n be the number of people who receive their own gifts back. Obtain the
expected value and variance of G n .
Then,
n
Gn = Xi . (6.2.7)
i=1
For any person, the probability of picking her/his own name tag is n1 . Thus, E {X i } =
1 × n1 + 0 × n−1 = n1 and Var {X i } = E X i2 − E2 {X i } = n1 − n12 . In addition,
n
because P X i X j = 1 = n(n−1)
1
and P X i X j = 0 = 1 − n(n−1)
1
for i = j, we have
Cov X i , X j = E X i X j − E {X i } E X j = n(n−1) − n 2 , i.e.,
1 1
1
Cov X i , X j = . (6.2.8)
n 2 (n − 1)
428 6 Convergence of Random Variables
n
Therefore, E {G n } = 1
n
= 1 and Var {G n } = nVar {X i } + n(n − 1)Cov (X i ,
i=1
X j = 1. In short, for any number n of the group, one person will get her/his own
gift back on average. ♦
Theorem 6.2.3 For independent random variables {X i }i=1n
, let the cf and mgf of X i
be ϕ X i (ω) and M X i (t), respectively. Then, we have
n
ϕ Sn (ω) = ϕ X i (ω) (6.2.9)
i=1
and
n
M Sn (t) = M X i (t) (6.2.10)
i=1
n
as the cf and mgf, respectively, of Sn = Xi .
i=1
n
ϕ Sn (ω) = ϕ X i (ω) (6.2.11)
i=1
n
of Sn = Xi .
i=1
n
ki
M Sn (t) = 1 − p + pet i=1 . (6.2.14)
n
This result implies Sn ∼ b ki , p . ♦
i=1
n n n
Example 6.2.5 We have shown that Sn = Xi ∼ N m i , σi2 when
n i=1 i=1 i=1
X i ∼ N m i , σi2 i=1 are independent of each other in Theorem 5.2.5.
n
Example 6.2.6 Show that Sn = X i ∼ G n, λ1 when {X i }i=1
n
are i.i.d. with
i=1
marginal exponential distribution of parameter λ.
λ
Solution The mgf of X i is M X (t) = λ−t as shown in (3.A.67). Thus, the mgf of Sn
λ n
is M Sn (t) = λ−t and, therefore, Sn ∼ G n, λ1 . ♦
Definition 6.2.1 (random sum) Assume that the support of the pmf of a random
variable N is a subset of {0, 1, . . .} and that the random variables {X 1 , X 2 , . . . ,
X N } are independent of N . The random variable
N
SN = Xi (6.2.15)
i=1
∞
M SN (t) = M Sn (t) p N (n), (6.2.16)
n=0
n
where p N (n) is the pmf of N and M Sn (t) is the mgf of Sn = Xi .
i=1
430 6 Convergence of Random Variables
∞
using M̃ N (g(t)) = E g N (t) = g n (t)P(N = n).
n=0
Example 6.2.7 Assume that i.i.d. exponential random variables {X n }∞ n=1 with mean
1
λ
are independent of a geometric random variable N with pmf p N (k) = (1 − α)k−1 α
N
for k ∈ {1, 2, . . .}. Obtain the distribution of the random sum S N = Xi .
i=1
αet
λ
Solution The mgf’s of N and X i are M N (t) = 1−(1−α)et
and M X (t) = λ−t
, respec-
α exp(ln λ−t
λ
)
tively. Thus, the mgf of S N is M SN (t) = 1−(1−α) exp(ln λ−t
, i.e.,
λ
)
αλ
M SN (t) = . (6.2.20)
αλ − t
1
Therefore, S N is an exponential random variable with mean αλ . This result is also
in agreement with the intuitive interpretation that S N is the sum of, on average, α1
variables of mean λ1 . ♦
6.2 Laws of Large Numbers and Central Limit Theorem 431
Theorem 6.2.6 When the random variables {X i } are i.i.d., we have the expected
value
N
of the random sum S N = Xi .
i=1
Proof
M X (t)
(Method 1) From (6.2.17), we have M S N (t) = M N (ln M X (t)) M X (t)
and
M X (t) 2
M SN (t) = M N (ln M X (t))
M X (t)
2
M X (t)M X (t) − M X (t)
+M N (ln M X (t)) . (6.2.23)
M X2 (t)
E S N2 = E N 2 E2 {X } + E{N }Var{X }. (6.2.25)
E{Y } = E {N } E {X } . (6.2.26)
N
N
N
Similarly, recollecting that Y 2 = X i2 + X i X j , the second moment
i=1 i=1 j=1
i=j
E Y 2 = E E Y 2 N can be evaluated as
432 6 Convergence of Random Variables
E Y 2 = E N E X 2 + N (N − 1)E2 {X }
= E {N } E X 2 − E2 {X } + E N 2 E2 {X }
= E {N } Var {X } + E N 2 E2 {X } . (6.2.27)
Sn − an p
→ 0 (6.2.28)
bn
Theorem 6.2.7 (Rohatgi and Saleh 2001) Assume a sequence of uncorrelated ran-
∞
dom variables {X i }i=1 with means E {X i } = m i and variances Var {X i } = σi2 . If
∞
σi2 → ∞, (6.2.30)
i=1
then
!−1 !
n
n
p
σi2 Sn − mi → 0. (6.2.31)
i=1 i=1
∞
In other words, the sequence {X i }i=1 satisfies the weak law of large numbers with
n 2
n
an = m i and bn = σi .
i=1 i=1
Var{Y }
Proof Employing the Chebyshev inequality P(|Y − E{Y }| ≥ ε) ≤ ε2
intro-
duced in (6.A.16), we have
! !−2 ⎡ 2 ⎤
n
n
n
n
P Sn − mi > ε σi ≤
2
ε σi2 E⎣ (X i − m i ) ⎦
i=1 i=1 i=1 i=1
!−1
n
= ε2 σi2 . (6.2.32)
i=1
n
n
In short, P Sn − m i > ε σi2 → 0 when n → ∞. ♠
i=1 i=1
∞
Example 6.2.9 (Rohatgi and Saleh 2001) If an uncorrelated sequence {X i }i=1 with
mean E {X i } = m i and variance Var {X i } = σi satisfies
2
1 2
n
lim σi = 0, (6.2.33)
n→∞ n 2
i=1
n p
then 1
n
Sn − mi → 0. This result, called the Markov theorem, can be easily
i=1
shown with the steps similar to those in the proof of Theorem 6.2.7. Here, (6.2.33)
is called the Markov condition. ♦
From now on, we assume bn = n in discussing the weak law of large numbers
unless specified otherwise.
Example 6.2.11 (Rohatgi and Saleh 2001) For an i.i.d. sequence of random variables
with distribution b(1, p), noting that the mean is p and the variance is p(1 − p), we
p
have Snn → p from Theorem 6.2.7 and Example 6.2.9. ♦
Example 6.2.12 (Rohatgi and Saleh 2001) For a sequence of i.i.d. random variables
with marginal distribution C(1, 0), we have Snn ∼ C(1, 0) as discussed in Exercise
6.3. In other words, because Snn does not converge to 0 in probability, the weak law
of large numbers does not hold for sequences of i.i.d. Cauchy random variables.
Example 6.2.13 (Rohatgi and Saleh 2001; Stoyanov 2013) For an i.i.d. sequence
∞ p
{X i }i=1 , if the absolute mean E {|X i |} is finite, then Snn → E {X 1 } when n → ∞ from
Theorem 6.2.7 and Example 6.2.9. This result is called Khintchine’s theorem.
Sn − an a.s.
−→ 0 (6.2.34)
bn
Sn − an
P lim = 0 = 1. (6.2.35)
n→∞ bn
A sequence of random variables that follows the strong law of large numbers also
follows the weak law of large numbers because almost sure convergence implies
convergence in probability. As in the discussion of the weak law of large numbers,
we often assume the normalizing constant bn = n also for the strong law of large
∞
numbers. We now consider sufficient conditions for a sequence {X i }i=1 to follow the
strong law of large numbers when bn = n.
∞
Theorem 6.2.8 (Rohatgi and Saleh 2001) The sum (X i − μi ) converges almost
i=1
surely to 0 if
∞
σi2 < ∞ (6.2.36)
i=1
∞ ∞
∞
for a sequence {X i }i=1 with means {μi }i=1 and variances σi2 i=1 .
6.2 Laws of Large Numbers and Central Limit Theorem 435
n
Theorem 6.2.9 (Rohatgi and Saleh 2001) If xi converges for a sequence {xn }∞
n=1 ,
i=1
1
n
then limb
bi xi = 0 for {bn }∞
n=1 such that bn ↑ ∞. This result is called the
n→∞ n i=1
Kronecker lemma.
∞
Example
2 ∞6.2.14 (Rohatgi and Saleh 2001) Let the means and variances be {μi }i=1
∞
and σi i=1 , respectively, for independent random variables {X i }i=1 . Then, we can
easily show that
1 a.s.
Sn − E {Sn } −→ 0 (6.2.37)
bn
∞
Example 6.2.15 (Rohatgi and Saleh 2001) If thevariances σn2 n=1 of independent
random variables are uniformly bounded, i.e., σn2 ≤ M for a finite number M, then
1 a.s.
Sn − E {Sn } −→ 0 (6.2.40)
n
∞
σn2
∞
from Kolmogorov condition because n2
≤ M
n2
< ∞. ♦
n=1 n=1
Example 6.2.16 (Rohatgi and Saleh 2001) Based on the result in Example 6.2.15,
it is easy to see that Bernoulli trials with probability of success p satisfy the strong
law of large numbers because the variance p(1 − p) is no larger than 41 .
Note that the Markov condition (6.2.33) and the Kolmogorov condition (6.2.38)
are sufficient conditions but are not necessary conditions.
Theorem 6.2.10 (Rohatgi and Saleh 2001) If the fourth moment is finite for an i.i.d.
∞
sequence {X i }i=1 with mean E {X i } = μ, then
436 6 Convergence of Random Variables
Sn
P lim = μ = 1. (6.2.41)
n→∞ n
In other words, Sn
n
converges almost surely to μ.
i=1 i=1
⎧ ⎫
⎨n
n
2 ⎬
+3E (X i − μ)2 Xj −μ
⎩ ⎭
i=1 j=1, j =i
= nE (X 1 − μ) 4
+ 3n(n − 1)σ 4
≤ cn 2 (6.2.42)
P ( Aε ) = 0 (6.2.43)
∞
Example 6.2.17 (Rohatgi and Saleh 2001) Consider an i.i.d. sequence {X i }i=1 and a
positive number B. If P (|X i | < B) = 1 for every i, then n converges almost surely
Sn
to the mean E {X i } of X i . This can be easily shown from Theorem 6.2.10 by noting
that the fourth moment is finite when P (|X i | < B) = 1. ♦
∞
Theorem 6.2.11 (Rohatgi and Saleh 2001) Consider an i.i.d. sequence {X i }i=1
with mean μ. If
then
Sn a.s.
−→ μ. (6.2.45)
n
The converse also holds true.
Note that the conditions in Theorems 6.2.10 and 6.2.11 are on the fourth moment
and absolute mean, respectively.
Example 6.2.18 (Stoyanov 2013) Consider an independent sequence {X n }∞
n=2 with
the pmf
1− 1
, x = 0,
p X n (x) = n log n
(6.2.46)
1
2n log n
, x = ±n.
∞
Letting An = {|X n | ≥ n} for n ≥ 2, we get P ( An ) → ∞ because P ( An )
n=2
∞
= 1
n log n
. P ( An ) is divergent and {X n }∞
In other words, n=2 are independent:
n=2 X n
therefore, the probability P (|X n | ≥ n occurs i.o.) = P n ≥ 1 occurs i.o. =
P lim Snn = 0 of {An occurs i.o.} is 1, i.e.,
n→∞
from Var {X k } = k
log k
, follows the weak law of large numbers.
Let us now discuss the central limit theorem (Feller 1970; Gardner 2010; Rohatgi
and Saleh 2001), the basis for the wide-spread and most popular use of the normal
n
distribution. Assume a sequence {X n }∞
n=1 and the sum Sn = X k . Assume
k=1
438 6 Convergence of Random Variables
1 l
(Sn − an ) → Y (6.2.49)
bn
and Snn ∼ C(1, 0) if X i ∼ C(1, 0): the normal and Cauchy distributions are typical
examples of the stable distribution. In this section, we discuss the conditions on
which the limit random variable Y has a normal distribution.
∞
Example 6.2.19 (Rohatgi and Saleh 2001) Assume an i.i.d. sequence √ {X i }i=1
with marginal distribution b(1, p). Letting an = E {Sn } = np and bn = Var {Sn } =
√ n
Sn −np Xi − p
np(1 − p), the mgf Mn (t) = E exp √np(1− p)
t = E exp √np(1− p)
t of
i=1
Sn −an
bn
can be obtained as
n
npt t
Mn (t) = exp − √ (1 − p) + p exp √
np(1 − p) np(1 − p)
n
pt (1 − p)t
= (1 − p) exp − √ + p exp √
np(1 − p) np(1 − p)
2 n
t 1
= 1+ +o . (6.2.50)
2n n
2
Thus, Mn (t) → exp t
2
when n → ∞ and, subsequently,
(
Sn − np 1 x
t2
P √ ≤x → √ exp − dt (6.2.51)
np(1 − p) 2π −∞ 2
2
because exp t
2
is the mgf of N (0, 1). ♦
∞
Theorem 6.2.12 (Rohatgi and Saleh 2001) For i.i.d. random variables {X i }i=1
with mean m and variance σ 2 , we have
Sn − nm l
√ → Z, (6.2.52)
nσ 2
jω 1 jω 2
ϕV (ω) = 1 + √ E{Y } + √ E Y2 + ···
nσ 2 2 nσ 2
ω2 1
= 1− +o . (6.2.53)
2n n
n −nm
n
Next, letting Z n = S√
= Vi and denoting the cf of Z n by ϕ Z n , we have
nσ 2
i=1
ω2
1 n
lim ϕ Z n (ω) = lim ϕVn (ω) = lim 1 − 2n
+o n
, i.e.,
n→∞ n→∞ n→∞
ω2
lim ϕ Z n (ω) = exp − (6.2.54)
n→∞ 2
∞
from (6.2.53) because {Vi }i=1 are independent. In short, the distribution of Z n =
n −nm
S√
2
converges to N (0, 1) as n → ∞. ♠
nσ
Theorem 6.2.12 is one of the many variants of the central limit theorem, and can
be derived from the Lindeberg’s central limit theorem introduced in Appendix 6.2.
∞
Example 6.2.20 (Rohatgi and Saleh 2001) Assume an i.i.d. sequence {X i }i=1 with
marginal pmf p X i (k) = (1 − p)k p ũ(k), where 0 < p < 1 and ũ(k) is the unit step
function in discrete space defined in (1.4.17). We have E {X i } = qp and Var {X i } =
q
p2
, where q = 1 − p. Thus, it follows from Theorem 6.2.12 that
√ Sn !
n p n −q
P √ ≤x → (x) (6.2.55)
q
for x ∈ R when n → ∞.
The central limit theorem is useful in many cases: however, it should also be noted
that there do exist cases in which the central limit theorem does not apply.
converge to the standard normal cdf (x) at some point x. This fact implies that
∞
{X i }i=1 does not satisfy the central limit theorem: the reason is that the random
variable X 1 is exceedingly large to virtually determine the distribution of Sn . ♦
The central limit theorem and laws of large numbers are satisfied for a wide
range of sequences of random variables. As we have observed in Theorem 6.2.7
and Example 6.2.15, the laws of large numbers hold true for uniformly bounded
independent sequences. As shown in Example 6.A.5 of Appendix 6.2, the central
440 6 Convergence of Random Variables
limit theorem holds true for an independent sequence even when the sum of variances
∞
diverges. Meanwhile, for an i.i.d. sequence {X i }i=1 , noting that
Sn |Sn − nm| ε√
P − m > ε =P √ > n
n σ n σ
ε√
≈ 1 − P |Z | ≤ n , (6.2.56)
σ
where Z ∼ N (0, 1), we can obtain the laws of large numbers directly from the
central limit theorem. In other words, the central limit theorem is stronger than the
laws of large numbers: yet, in the laws of large numbers we are not concerned with
the existence of the second moment. In some independent sequences for which the
central limit theorem holds true, on the other hand, the weak law of large numbers
does not hold true.
Example
6.2.22
(Feller
1970;
Rohatgi and Saleh 2001) Assume the pmf
P X k = k λ = P X k = −k λ = 21 for an independent sequence {X k }∞ k=1 , where
λ > 0. Then, the mean and variance of X k are E {X k } = 0 and Var {X k } = k 2λ ,
n
n
respectively. Now, letting sn2 = Var {X k } = k 2λ , we have
k=1 k=1
n 2λ+1 − 1
sn2 ≥ (6.2.57)
2λ + 1
n n
from k 2λ ≥ 1 x 2λ d x. Here, we can assume n > 1 without loss of generality,
k=1
−1 2λ+1
and if we let n > 2λ+1ε2
+ 1, we have ε2 > 2λ+1
n−1
and ε2 sn2 > 2λ+1 s 2 ≥ 2λ+1
n−1 n
n
n−1 2λ+1
=
λ
n 2λ + n 2λ−1 + · · · + 1 > n 2λ . Therefore, for n > 2λ+1
ε 2 + 1, we have |x kl | > n if
|xkl | > εsn . Noting in addition that P (X k = x) is non-zero only when |x| ≤ n λ , we
get
1
n
xkl2 pkl = 0. (6.2.58)
sn2 k=1 |xkl |>εsn
In short, the Lindeberg conditions3 are satisfied and the central limit theorem holds
n n
true. Now, if we consider sn2 = k 2λ ≤ 0 x 2λ d x, i.e.,
k=1
n 2λ+1
sn2 ≤ (6.2.59)
2λ + 1
3 Equations (6.A.5) and (6.A.6) in Appendix 6.2 are called the Lindeberg conditions.
6.2 Laws of Large Numbers and Central Limit Theorem 441
)
n 2λ+1
and (6.2.57), we can write sn ≈ 2λ+1
. Based on this, we have
* ! (
2λ + 1 1 b
t2
P a< Sn < b → √ exp − dt (6.2.60)
n 2λ+1 2π a 2
Let us discuss one application of the central limit theorem. First, from Theorem
6.2.12, we get the following theorem:
1.5
1.5
1 1
0.5 0.5
0 0
−1 0 1 2 −1 0 1 2 3
(A) (B)
1.5 1.5
1 1
0.5 0.5
0 0
−1 0 1 2 3 4 −1 0 1 2 3 4 5
(C) (D)
Fig. 6.1 The pdf (blue solid line) and asymptotic pdf (black dashed line) of Sn for an i.i.d. sequence
with marginal distribution U (0, 1): (A) n = 1, (B) n = 2, (C) n = 3, (D) n = 4
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 2 4 0 2 4 6 8
(A) (B)
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 5 10 15 0 10 20 30
(C) (D)
Fig. 6.2 The cdf (blue solid line) of Sn and approximate cdf (black
dotted
line) from the central
limit theorem for an i.i.d. sequence with marginal distribution b 1, 21 : (A) n = 4, (B) n = 8, (C)
n = 16, (D) n = 32
Solution It is easy to see that Sn ∼ b n, 21 from Example 6.2.3. Noting that X i
has mean 21 and variance 41 , the asymptotic distribution of Sn is N n2 , n4 . Figure 6.2
shows the cdf and asymptotic cdf of Sn for n = 4, 8, 16, 32, which confirms that the
two cdf’s are closer when n is larger.
σ2
N m, (6.2.63)
n
x − nm
FSn (x) ≈ √ (6.2.64)
nσ 2
Sn − np x − np x − np
P √ ≤√ ≈ √ , (6.2.65)
np(1 − p) np(1 − p) np(1 − p)
1 1
P (x1 ≤ X ≤ x2 ) = P x1 − < X < x2 + (6.2.66)
2 2
Appendices
For a sequence {X n }∞
n=1 , let Fn and Mn be the cdf and mgf, respectively, of X n . We
first note that, when n → ∞, the sequence of cdf’s does not always converge and,
even when it does, the limit is not always a cdf.
which is a cdf. ♦
Example 6.A.2 (Rohatgi and Saleh 2001) Consider the cdf Fn (x) = u(x − n) of
X n . The limit of the sequence {Fn }∞
n=1 is lim Fn (x) = 0, which is not a cdf.
n→∞
Example 6.A.3 (Rohatgi and Saleh 2001) Assume the pmf P (X n = −n) = 1 for a
sequence {X n }∞
n=1 . Then, the mgf is Mn (t) = e
−nt
and its limit is lim Mn (t) = M(t),
n→∞
where
⎧
⎨ 0, t > 0,
M(t) = 1, t = 0, (6.A.3)
⎩
∞, t < 0.
The function M(t) is not an mgf. In other words, the limit of a sequence of mgf’s is
not necessarily an mgf. ♦
Example 6.A.4 Assume the pdf f n (x) = πn 1+n12 x 2 of X n . Then, the cdf is Fn (x) =
n x dt
π −∞ 1+n 2 t 2
. We also have lim f n (x) = δ(x) and lim Fn (x) = u(x). These limits
n→∞
−ε ∞n→∞
imply lim P (|X n − 0| > ε) = −∞ δ(x)d x + ε δ(x)d x = 0 and, consequently,
n→∞
{X n }∞
n=1 converges to 0 in probability. ♦
The central limit theorem can be expressed in a variety of ways. Among those vari-
eties, the Lindeberg central limit theorem is one of the most general ones and does
not require the random variables to have identical distribution.
∞
Theorem 6.A.1 (Rohatgi and Saleh 2001) For an independent sequence {X i }i=1 ,
let the mean, variance, and cdf of X i be m i , σi2 , and Fi , respectively. Let
n
sn2 = σi2 . (6.A.4)
i=1
Appendices 445
When the cdf Fi is absolutely continuous, assume that the pdf f i (x) = d
dx
Fi (x)
satisfies
n (
1
lim (x − m i )2 f i (x)d x = 0 (6.A.5)
n→∞ s 2 |x−m |>εs
n i=1 i n
∞
for every value of ε > 0. When {X i }i=1 are discrete random variables, assume the
pmf pi (x) = P (X i = x) satisfies5
1
n
lim (xil − m i )2 pi (xil ) = 0 (6.A.6)
n→∞ s 2
n i=1 |xil −m i |>εsn
Li
for every value of ε > 0, where {xil }l=1 are the jump points of Fi with L i the number
of jumps of Fi . Then, the distribution of
!
1
n
n
Xi − mi (6.A.7)
sn i=1 i=1
converges to N (0, 1) as n → ∞.
n 2
n
sn2 = Var {X k } = 13 ak → ∞ when n → ∞. Then, from the Chebyshev
k=1 k=1
Var{Y }
inequality P(|Y − E{Y }| ≥ ε) ≤ ε2
discussed in (6.A.16), we get
(
1 A2 Var {X k }
n n
x 2 Fk (x)d x ≤
sn2 k=1 sn2 k=1 ε2 sn2
|x|>εsn
A2
=
ε2 sn2
→0 (6.A.8)
n
n
n
as n→∞ because 1
sn2
x 2 Fk (x)d x ≤ 1
sn2
A2 2a1k d x = A2
sn2
k=1 |x|>εsn k=1 |x|>εsn k=1
P (|X k | > εsn ).
∞
Meanwhile, assume ak2 < ∞, and let sn2 ↑ B 2 for n → ∞. Then, for a con-
k=1
stant k, we can find εk such thatεk B < ak , and we have εk sn < εk B. Thus,
P (|X k | > εk sn ) ≥ P (|X k | > εk B) > 0. Based on this result, for n ≥ k, we get
5 As mentioned in Example 6.2.22 already, (6.A.5) and (6.A.6) are called the Lindeberg condition.
446 6 Convergence of Random Variables
( (
1 sn2 εk2
n n
x 2 F j (x)d x ≥ F j (x)d x
sn2 j=1 sn2 j=1
|x|>εk sn |x|>εk sn
because x 2 < |x| from |x|δ x 2 > |εsn |δ x 2 when |x| > εsn . We can similarly show
2+δ
εδ snδ
that the central limit theorem holds true in discrete random variables. ♦
The conditions (6.A.5) and (6.A.6) are the necessary conditions in the following
∞
2 ∞for a sequence
sense: {X i }i=1 of independent random variables, assume the variances
∞
σi i=1 of {X i }i=1 are finite. If the pdf of X i satisfies (6.A.5) or the pmf of X i satisfies
(6.A.6) for every value of ε > 0, then
⎛ ⎞
Xn − E Xn
lim P ⎝ ) ⎠
≤x = (x) (6.A.11)
n→∞
Var X n
Appendices 447
and
)
lim P max |X k − E {X k }| > nε Var X n = 0, (6.A.12)
n→∞ 1≤k≤n
n
and the converse also holds true, where X n = 1
n
X i is the sample mean of {X i }i=1
n
i=1
defined in (5.4.1).
d p d d
(11) If X n → X and Yn → a, then X n ± Yn → X ± a, X n Yn → a X for a = 0,
p d
X n Yn → 0 for a = 0, and Xn
Yn
→ X
a
for a = 0.
r =2
(12) If X n −→ X , then lim E {X n } = E{X } and lim E X n2 = E X 2 .
n→∞ n→∞
Lr
(13) If X n → X , then lim E |X n |r = E {|X |r }.
n→∞
p a.s.
(14) If X 1 > X 2 > · · · > 0 and X n → 0, then X n −→ 0.
n
An = ak . (6.A.13)
k=1
∞
The infinite product ak is called convergent to the limit A when An → A and
k=1
A = 0 for n → ∞; divergent to 0 when An → 0; and divergent when An is not
convergent to a non-zero value. The convergence of products is often related to the
convergence of sums as shown below.
∞
(1) When all the real numbers {ak }∞
k=1 are positive, the convergence of ak and
k=1
∞
that of ln ak are the necessary and sufficient conditions of each other.
k=1
∞
(2) When all the real numbers {ak }∞
k=1 are positive, the convergence of (1 + ak )
k=1
∞
and that of ak are the necessary and sufficient conditions of each other.
k=1
∞
(3) When all the real numbers {ak }∞
k=1 are non-negative, the convergence of (1 −
k=1
∞
ak ) and that of ak are the necessary and sufficient conditions of each other.
k=1
E{h(X )}
P(h(X ) ≥ ε) ≤ (6.A.14)
ε
for ε > 0, which is called the tail probability inequality.
Proof Assume X is a discrete random variable. Letting P (X = xk ) = pk , we
have E{h(X )} = + h (xk ) pk ≥ h (xk ) pk when A = {k : h (xk ) ≥ }:
A
Ac A
this yields E{h(X )} ≥ pk = P(h(X ) ≥ ) and, subsequently, (6.A.14). ♠
A
E{X }
P(X ≥ α) ≤ (6.A.15)
α
for α > 0, which is called the Markov inequality.
∞h(X ) =
The Markov inequality can be proved easily from (6.A.14) by letting
|X | and
∞ε = α. We can show
∞ the Markov inequality also from E{X } = 0 x f X (x)
d x ≥ α x f X (x)d x ≥ α α f X (x)d x = αP(X ≥ α) by recollecting that a pdf is
non-negative.
Theorem 6.A.4 The mean E{Y } and variance Var{Y } of any random variable Y
satisfy
Var{Y }
P(|Y − E{Y }| ≥ ε) ≤ (6.A.16)
ε2
for any ε > 0, which is called the Chebyshev inequality.
Proof The random variable X = (Y − E{Y1 }) is non-negative.
2
Thus, if we use
}
(6.A.15), we get P [Y − E{Y }] ≥ ε ≤ ε2 E [Y − E{Y }]2 = Var{Y
2 2
. Now, not-
ε2
ing that P [Y − E{Y }] ≥ ε = P(|Y − E{Y }| ≥ ε), we get (6.A.16).
2 2
♠
Theorem 6.A.5 (Rohatgi and Saleh 2001) The absolute mean E{|X |} of any random
variable X satisfies
∞
∞
P(|X | ≥ n) ≤ E{|X |} ≤ 1 + P(|X | ≥ n), (6.A.17)
n=1 n=1
6 The inequality (6.A.15) holds true also when X is replaced by |X |r for r > 0.
450 6 Convergence of Random Variables
∞
kP(k ≤ |X | < k + 1) ≤ E{|X |}
k=0
∞
≤ (k + 1)P(k ≤ |X | < k + 1). (6.A.18)
k=0
∞
∞
∞
∞
Now, employing kP(k ≤ |X | < k + 1) = P(k ≤ |X | < k + 1) =
k=0 n=1 k=n n=1
∞
∞
P(|X | ≥ n) and (k + 1)P(k ≤ |X | < k + 1) = 1 + kP(k ≤ |X | < k + 1) =
k=0 k=0
∞
1+ P(|X | ≥ n) in (6.A.18), we get (6.A.17). A similar procedure will show the
n=1
result for discrete random variables. ♠
Proof Let m = E{X }. Then, from the intermediate value theorem, we have
1
h(X ) = h(m) + (X − m)h (m) + (X − m)2 h (α) (6.A.20)
2
for −∞ < α < ∞. Taking the expectation of the above equation, we get E{h(X )} =
h(m) + 21 h (α)σ X2 . Recollecting that h (α) ≥ 0 and σ X2 ≥ 0, we get E{h(X )} ≥
h(m) = h(E{X }). ♠
Theorem 6.A.7 (Rohatgi and Saleh 2001) If the n-th absolute moment E {|X |n } is
finite, then
1 1
E{|X |s } s ≤ E{|X |r } r (6.A.21)
7 A function h is called convex or concave up when h(t x + (1 − t)y) ≤ th(x) + (1 − t)h(y) for
every two points x and y and for every choice of t ∈ [0, 1]. A convex function is a continuous
function with a non-decreasing derivative and is differentiable except at a countable number of
points. In addition, the second order derivative of a convex function, if it exists, is non-negative.
Appendices 451
where f is the pdf of X . Letting βn = E {|X |n }, (6.A.22) can be written as Q(u, v) =
βk−1 βk β β
(u v) (u v)T . Now, we have k−1 k ≥ 0, i.e., βk2k ≤ βk−1 k
βk+1
k
βk βk+1 βk βk+1
because Q ≥ 0 for every choice of u and v. Therefore, we have
2(n−1)
β12 ≤ β01 β21 , β24 ≤ β12 β32 , · · · , βn−1 ≤ βn−2 βn
n−1 n−1
(6.A.23)
E{g(|X |)}
P(|X | ≥ ε) ≤ (6.A.24)
g(ε)
E{|X |r }
P(|X | ≥ ε) ≤ (6.A.25)
εr
for ε > 0, which is called the Bienayme-Chebyshev inequality.
(B) Inequalities of Random Vectors
Theorem 6.A.10 (Rohatgi and Saleh 2001) For two random variables X and Y ,
we have
E2 {X Y } ≤ E X 2 E Y 2 , (6.A.26)
2
for real numbers a and b. Now, if E X 2 = 0, then P(X = 0) = 1 and
452 6 Convergence of Random Variables
thus E{X Y } = 0, implying2that
(6.A.26) holds true. Next when
E X 2 > 0, rec-
ollecting that E (α X + Y ) = α 2 E X 2 + 2αE{X Y } + E Y 2 ≥ 0 for any real
number α, we have EE{{XX 2Y}} − 2 EE{{XX 2Y}} + E Y 2 ≥ 0 by letting α = − EE{X Y}
2 2
{ X 2 } . This
inequality is equivalent to (6.A.26). ♠
Theorem 6.A.11 (Rohatgi and Saleh 2001) For zero-mean independent random
n
k
variables {X i }i=1
n
with variances σi2 i=1 , let Sk = X j . Then,
j=1
n
σ2
P max |Sk | > ε ≤ i
(6.A.27)
1≤k≤n
i=1
ε2
n 2
from (6.A.29). Subsequently, using E Sn K Bk (Sk ) = E Sn2 K Acn (Sn )
k=1
n
n
n
≤E Sn2 = σk2 and (6.A.30), we get σk2 ≥ ε2 P (Bk ) = ε2 P Acn , which
k=1 k=1 k=1
is the same as (6.A.27). ♠
Appendices 453
Example 6.A.7 (Rohatgi and Saleh 2001) The Chebyshev inequality (6.A.16) with
E{Y } = 0, i.e.,
Var{Y }
P (|Y | > ε) ≤ (6.A.31)
ε2
is the same as the Kolmogorov inequality (6.A.27) with n = 1. ♦
and
P (Yn ≤ α) ≤ exp −n tr g (tr ) − g (tr ) , tr ≤ 0. (6.A.33)
The inequalities (6.A.32) and (6.A.33) are called the Chernoff bounds.
When tr = 0, the right-hand sides of the two inequalities (6.A.32) and (6.A.33) are
both 1 from g (tr ) = ln M (tr ) = ln M(0) = 0: in other words, the Chernoff bounds
simply say that the probability is no larger than 1 when tr = 0, and thus the Chernoff
bounds are more useful when tr = 0.
α2
P(X ≥ α) ≤ exp − , α≥0 (6.A.34)
2
and
α2
P(X ≤ α) ≤ exp − , α≤0 (6.A.35)
2
♦
Example 6.A.9 For X ∼ P(λ), assume n = 1 and Y1 = X . From the mgf M(t) =
exp{λ(et − 1)}, we get g(t)=ln M(t) = λ(et − 1) and g (t) = λet . Solving α =
ng (t) = λet , we get tr = ln αλ . Thus, tr > 0 when α > λ, tr = 0 when α = λ, and
tr < 0 when α < λ. Therefore, we have
454 6 Convergence of Random Variables
α
eλ
P(X ≥ α) ≤ e−λ , α≥λ (6.A.36)
α
and
α
eλ
P(X ≤ α) ≤ e−λ , α≤λ (6.A.37)
α
α α
because n tr g (tr ) − g (tr ) = ln αλ − α + λ from g (tr ) = λ −1 =α
λ
− λ and tr g (tr ) = α ln αλ .
Theorem 6.A.13 If p and q are both larger than 1 and 1
p
+ 1
q
= 1, then
1 1
E{|X Y |} ≤ E p X p E q Y q , (6.A.38)
Exercises
Exercise 6.1 For the sample space [0, 1], consider a sequence of random variables
defined by
1, ω ≤ n1 ,
X n (ω) = (6.E.1)
0, ω > n1
and let X (ω) = 0 for ω ∈ [0, 1]. Assume the probability measure P(a ≤ ω ≤ b) =
b − a, the Lebesgue measure mentioned following (2.5.24), for 0 ≤ a ≤ b ≤ 1. Dis-
cuss if {X n (ω)}∞
n=1 converges to X (ω) surely or almost surely.
Exercise 6.2 For the sample space [0, 1], consider the sequence
⎧
⎨ 3, 0 ≤ ω < 2n
1
,
X n (ω) = 4, 1 − 2n < ω ≤ 1,
1
(6.E.2)
⎩ 1
5, 2n < ω < 1 − 2n
1
Exercises 455
and let X (ω) = 5 for ω ∈ [0, 1]. Assuming the probability measure P(a ≤ ω ≤ b) =
b − a for 0 ≤ a ≤ b ≤ 1, discuss if {X n (ω)}∞
n=1 converges to X (ω) surely or almost
surely.
Exercise 6.3 When {X i }i=1
n
are independent random variables, obtain the distribu-
n
tion of Sn = X i in each of the following five cases of the distribution of X i .
i=1
(1) geometric distribution with parameter α,
(2) NB (ri , α),
(3) P (λi ),
(4) G (αi , β), and
(5) C (μi , θi ).
Sn
Exercise 6.4 To what does n
converge in Example 6.2.10?
X√−λ
Exercise 6.5 Let Y = for a Poisson random variable X ∼ P(λ). Noting that
λ
the mgf of X is M X (t) = exp λ et − 1 , show that Y converges to a standard
normal random variable as λ → ∞.
Exercise 6.6 For a sequence {X n }∞
n=1 with the pmf
1
, x = 1,
P (X n = x) = n (6.E.3)
1 − n1 , x = 0,
l
show that X n → X , where X has the distribution P(X = 0) = 1.
Exercise 6.7 Discuss if the weak law of large numbers holds true for a sequence of
2+α u(x − 1), where α > 0.
i.i.d. random variables with marginal pdf f (x) = x1+α
n
Exercise 6.8 Show that Sn = X k converges to a Poisson random variable with
k=1
distribution P(np) when n → ∞ for an i.i.d. sequence {X n }∞
n=1 with marginal dis-
tribution b(1, p).
∞
Exercise 6.9 Discuss the central limit theorem for an i.i.d. sequence {X i }i=1 with
marginal distribution B(α, β).
Exercise 6.10 An i.i.d. sequence {X i }i=1
n
has marginal distribution P(λ). When
n
n is large enough, we can approximate as Sn = X k ∼ N (nλ, nλ). Using the
k=1
continuity correction, obtain the probability P (50 < Sn ≤ 80).
Exercise 6.11 Consider an i.i.d. Bernoulli sequence {X i }i=1n
with P (X i = 1)
= p, a binomial random variable M ∼ b(n, p) which is independent of {X i }i=1 n
,
n
and K = X i . Note that K is the number of successes in n i.i.d. Bernoulli trials.
i=1
K
M
Obtain the expected values of U = X i and V = Xi .
i=1 i=1
456 6 Convergence of Random Variables
Exercise 6.12 The result of a game is independent of another game, and the prob-
abilities of winning and losing are each 21 . Assume there is no tie. When a person
wins, the person gets 2 points and then continues. On the other hand, if the person
loses a round, the person gets 0 points and stops. Obtain the mgf, expected value,
and variance of the score Y that the person may get from the games.
Exercise 6.13 Let Pn be the probability that we have more head than tail in a toss
of n fair coins.
(1) Obtain P3 , P4 , and P5 .
(2) Obtain the limit lim Pn .
n→∞
Exercise 6.14 For an i.i.d. sequence {X n ∼ N (0, 1)}∞ n=1 , let the cdf of X n =
n
1
n
X i be Fn . Obtain lim Fn (x) and discuss whether the limit is a cdf or not.
i=1 n→∞
{Y2
i = X + Wi }i=1 , X and {Wi }i=1 are independent
n n
Exercise 6.18 Inthe sequence
n
of each other, and Wi ∼ N 0, σi i=1 is an i.i.d. sequence, where σi2 ≤ σmax
2
< ∞.
We estimate X via
1
n
X̂ n = Yi (6.E.5)
n i=1
∞
Exercise 6.19 Assume an i.i.d. sequence {X i }i=1 with marginal pdf f (x) =
−x+θ
e u(x − θ ). Show that
p
min (X 1 , X 2 , . . . , X n ) → θ (6.E.6)
and that
p
Y → 1+θ (6.E.7)
n
for Y = 1
n
Xi .
i=1
p
Exercise 6.20 Show that max (X 1 , X 2 , . . . , X n ) −→ θ for an i.i.d. sequence
∞
{X i }i=1 with marginal distribution U (0, θ ).
Let {Yn }∞ ∞
n=1 and {Z n }n=1 be defined by Yn = max (X 1 , X 2 , . . . , X n ) and Z n =
∞
n (1 − Yn ). Show that the sequence
n }n=1 converges in distribution to a random
{Z
−z
variable Z with cdf F(z) = 1 − e u(z).
Exercise 6.22 For the sample space = {1, 2, . . .} and probability measure P(n) =
α
n2
, assume a sequence {X n }∞
n=1 such that
n, ω = n,
X n (ω) = (6.E.9)
0, ω = n.
r =2
we have X n −→ 0 because lim E X n2 = lim 1
= 0. Show that the sequence
n→∞ n→∞ n
{X n }∞
n=1 does not converge almost surely.
458 6 Convergence of Random Variables
as n → ∞.)
(2) Show that the sample mean of n i.i.d. Cauchy random variables is a Cauchy
random variable.
100
Exercise 6.29 Assume an i.i.d. sequence {X i ∼ P(0.02)}i=1
100
. For S = X i , obtain
i=1
the value P(S ≥ 3) using the central limit theorem and compare it with the exact
value.
among which four are shown in Fig. 6.3. Obtain lim Fn (x) and discuss if
n→∞
d
d
dx
lim Fn (x) is the same as lim F
dx n
(x) .
n→∞ n→∞
Exercises 459
0 1 x 0 1 x 0 1 x 0 1 x
sin(2nπ x)
Fig. 6.3 Four cdf’s Fn (x) = x 1 − 2nπ x for n = 1, 4, 16, and 64 on x ∈ [0, 1]
∞
Exercise 6.31 Assume an i.i.d. sequence X i ∼ χ 2 (1) i=1 . Then, we have
Sn ∼ χ 2 (n), E {Sn } = n, and Var {Sn } = 2n. Thus, letting Z n = √12n (Sn − n) =
0 n Sn tZ 0n − n2
2 n
− 1 , the mgf M n (t) = E e n
= exp −t 2
1 − √2t
2n
of Z n can be
obtained as
* ! * * !− n2
2 2 2
Mn (t) = exp t −t exp t (6.E.14)
n n n
0n
for t < 2
. In addition, from Taylor approximation, we get
* ! * * !2 * !3
2 2 t2 2 1 2
exp t = 1+t + + t exp (θn ) (6.E.15)
n n 2 n 6 n
)
l
for 0 < θn < t 2
n
. Show that Z n → Z ∼ N (0, 1).
Exercise 6.33 For a facsimile (fax), the number W of pages sent is a geometric ran-
k−1
dom variable with pmf pW (k) = 34k for k ∈ {1, 2, . . .} and mean β1 = 4. The amount
Bi of information contained in the i-th page is a geometric random variable with pmf
k−1
p B (k) = 1015 1 − 10−5 for k ∈ {1, 2, . . .} with expected value α1 = 105 . Assum-
∞ ∞
ing that {Bi }i=1 is an i.i.d. sequence and that W and {Bi }i=1 are independent of each
other, obtain the distribution of the total amount of information sent via this fax.
∞
Exercise 6.34 Consider a sequence {X i }i=1 of i.i.d. exponential random variables
with mean λ . A geometric random variable N is of mean 1p and is independent of
1
∞ N
{X i }i=1 . Obtain the expected value and variance of the random sum S N = Xi .
i=1
460 6 Convergence of Random Variables
Exercise 6.35 Depending on the weather, the number N of icicles has the pmf
∞
p N (n) = 10
1 2−|3−n|
2 for n = 1, 2, . . . , 5 and the lengths {L i }i=1 of icicles are i.i.d.
−λv ∞
with marginal pdf f L (v) = λe u(v). In addition, N and {L i }i=1 are independent
of each other. Obtain the expected value of the sum T of the lengths of the icicles.
Exercise 6.36 Check if the following sequences of cdf’s are convergent, and if yes,
obtain the limit:
(1) sequence {Fn (x)}∞
n=1 with cdf
⎧
⎨ 0, x < −n,
Fn (x) = 1
(x + n), −n ≤ x < n, (6.E.16)
⎩ 2n
1, x ≥ n.
F(x).
References
Chapter 1 Preliminaries
m+1
l
, where n = 2k × 3l × 5m · · · is the factorization of n in prime
factors.
(3) For an element x = 0.α1 α2 · · · of the Cantor set, let f (x) = 0. α21 α22 · · · .
(4) a sequence (α1 , α2 , . . .) of 0 and 1 → the number 0.α1 α2 · · · .
Exercise 1.12 (1) When two intervals (a, b) and (c, d) are both finite, f (x) =
c + (d − c) b−a x−a
.
When a is finite, b = ∞, and (c, d) is finite, f (x) = c + π2 (d − c)arctan(x − a).
Similarly in other cases.
(2) S1 → S2 , where S2 is an infinite sequence of 0 and 1 obtained by replacing 1 with
(1, 0) and 2 with (1, 1) in an infinite sequence S1 = (a0 , a1 , . . .) of 0, 1, and 2.
© The Editor(s) (if applicable) and The Author(s), under exclusive license 461
to Springer Nature Switzerland AG 2022
I. Song et al., Probability and Random Variables: Theory and Applications,
https://doi.org/10.1007/978-3-030-97679-8
462 Answers to Selected Exercises
∞
Exercise 1.29 −∞ (cos x + sin x)δ x 3 + x 2 + x d x = 1.
Exercise
1.31
∞ ∞
(1) 1 + n1 , 2 n=1 → (1, 2). (2) 1 + n1 , 2 n=1 → (1, 2].
∞ ∞
(3) 1, 1 + n1 n=1 → (1, 1] = ∅. (4) 1, 1 + n1 n=1 → [1, 1] = {1}.
∞ ∞
(5) 1 − n1 , 2 n=1 → [1, 2). (6) 1 − n1 , 2 n=1 → [1, 2].
∞ ∞
(7) 1, 2 − n1 n=1 → (1, 2). (8) 1, 2 − n1 n=1 → [1, 2).
1 1
Exercise 1.32 0 lim f n (x)d x = 0. lim 0 f n (x)d x = 21 .
1 n→∞ n→∞
b
Exercise 1.33 0 lim f n (x)d x = 0. lim 0 f n (x)d x = ∞.
n→∞ n→∞
Answers to Selected Exercises 463
Exercise 1.34 The number of all possible arrangements with ten distinct red balls
and ten distinct black balls = 20! ≈ 2.43 × 1018 .
Exercise 1.41 When p > 0, p C0 − p C1 + p C2 − p C3 + · · · = 0.
When p > 0, p C0 + p C1 + p C2 + p C3 + · · · = 2 p .
∞
∞
When p > 0, p C2k+1 = p C2k = 2
p−1
.
k=0 k=0
1 2 3
Exercise 1.42 (1 + z) 2 = 1 + 2z − z8 + 16 z
− . . ..
1 − 1
z + 3 2
z − 5 3
z + 35 4
z − ..., |z| < 1,
(1 + z)− 2 =
1
2 8 16 128
− 21 1 − 23 3 − 25 5 − 27 35 − 29
z − 2 z + 8 z − 16 z + 128 z − . . . , |z| > 1.
Exercise 2.1 F (C) = {∅, {a}, {b}, {a, b}, {c, d}, {b, c, d}, {a, c, d}, S}.
Exercise 2.2 σ (C) = {∅, {a}, {b}, {a, b}, {c, d}, {b, c, d}, {a, c, d}, S}.
Exercise 2.3 (1) Denoting the lifetime of the battery by t, S = {t : 0 ≤ t < ∞}.
(2) S = {(n, m) : (0, 0), (1, 0), (1, 1), (2, 0), (2, 1), (2, 2)}.
(3) S = {(1, red), (2, red), (3, green), (4, green), (5, blue)}.
Exercise 2.4 P ( AB c + B Ac ) = 0 when P(A) = P(B) = P(AB).
Exercise 2.5 P(A ∪ B) = 21 . P(A ∪ C) = 23 . P(A ∪ B ∪ C) = 1.
Exercise 2.6 (1) C = Ac ∩ B.
Exercise 2.7 The probability that red balls and black balls are placed in an alternating
fashion = 2×10!×10!
20!
≈ 1.08 × 10−5 .
Exercise 2.8 P(two nodes are disconnected) = p 2 (2 − p).
Exercise 2.12 Buying 50 tickets in one week brings us a higher probability of getting
the winning ticket than buying one ticket over 50 weeks.
99 50
1
2
versus 1 − 100 ≈ 0.395
Exercise 2.13 16 .
Exercise 2.14 21 .
k
Exercise 2.15 P C ∩ (A − B)c = ∅ = 58 .
n−1 1 1 n−1 1
Exercise 2.16 pn,A = − 41 − 13 + 4 . pn,B = 12
1
− + .
1 9 1310 4
p10,A = 4 1 − − 3
1
≈ 0.250013. p10,B = 4 1 − − 3
1
≈ 0.249996.
Exercise
2.17 (1), (2) probability of no match
0, N = 1,
= 1 (−1) N
2!
− 3! + · · · + N ! , N = 2, 3, . . . .
1
(3)⎧probability of k matches
⎪ (−1) N −k
⎨ k!1 2!1 − 3!1 + · · · + (N −k)! , k = 0, 1, . . . , N − 2,
= 0, k = N − 1,
⎪
⎩1
k!
, k = N.
464 Answers to Selected Exercises
Exercise 2.18 43 .
Exercise 2.19 (1) α = p 2 . P((k, m) : k ≥ m) = 2−1 p .
(2) P((k, m) : k + m = r ) = p 2 (1 − p)r −2 (r − 1).
(3) P((k, m) : k is an odd number) = 2−1 p .
Exercise 2.20 P(A ∩ B) = 10 3
. P(A|B) = 35 . P(B|A) = 37 .
Exercise 2.21 Probability that only two will hit the target = 1000 398
.
Exercise 2.22 (4) P (B | A) = 1 − p or
c 1−s−q+qr
r
. P (B |A c
) = 1 − q or s−r
1−r
p
.
(1−q)(1−r ) (1− p)r
P (A | B) =
c
s
or s . P ( A |B ) = 1−s or 1−s .
s−r p c 1−s−q+qr
P ( X = 1| Y = 0) = P ( X = 0| Y = 1) = 1 − p11 .
P ( X = 1| Y = 1) = P ( X = 0| Y = 0) = p11 .
Exercise 2.33 (1) α1,1 = m(m−1) n(n−1)
. α0,1 = m(n−m)
n(n−1)
. α1,0 = m(n−m)
n(n−1)
.
(n−m)(n−m−1)
α0,0 = n(n−1)
.
. α̃0,0 = (n−m)
2 2
(2) α̃1,1 = mn 2 . α̃0,1 = m(n−m)
n2
. α̃1,0 = m(n−m)
n2 n2
.
(n−m)(n−m−1)
(3) β0 = n(n−1)
. β1 = n(n−1) . β2 = n(n−1) .
2m(n−m) m(m−1)
k N
q q
−
Exercise 2.37 pk =
p p
q
N .
1− p
(In the pdf, the set {1 < y ≤ 2, 2 < y ≤ 5} can be replaced with {1 < y ≤ 2, 2 <
y < 5}, {1 < y < 2, 2 ≤ y ≤ 5}, {1 < y < 2, 2 ≤ y < 5}, {1 < y < 2, 2 < y ≤
5}, or {1 < y < 2, 2 < y < 5}.) 2√
0, √ y ≤ 1; y − 1, 1 ≤ y ≤ 2;
cdf: FY (y) = 1 1 3
+ y − 1, 2 ≤ y ≤ 5; 1, y ≥ 5.
3 3 ⎧
⎨ 0, y < −18; 1
7
, −18 ≤ y < −2;
Exercise 3.7 cdf FY (y) = 73 , −2 ≤ y < 0; 47 , 0 ≤ y < 2;
⎩6
, 2 ≤ y < 18; 1, y ≥ 18.
−1 7
Exercise 3.8 E X = n−2
1
, n = 3, 4, . . ..
Exercise 3.9 f Y (y) = FX (y)δ(y) + f X (y)u(y) = FX (0)δ(y) + f X (y)u(y), where
FX is the cdf of X . ⎧
⎪
⎨ 0, √ y ≤ 0,
θ
Exercise 3.10 f Y (y) = e√ y cosh θ y , 0 < y ≤ θ2 ,
1
⎪ √
⎩ √ exp −θ y , y > 2 .
θ 1
2e y θ
FX (x)−FX (b)
Exercise 3.11 FX |b<X ≤a (x) = FX (a)−FX (b)
.
f X (x)
f X |b<X ≤a (x) = FX (a)−FX (b)
u(x − b)u(a − x)
. Var{X |X > a} = (1−a)
2
Exercise 3.12 E{X |X > a} = 1+a 2 12
.
Exercise 3.13 P(950 ≤ R ≤ 1050) = 2 . 1
Exercise 3.14 Let X be the time to take to the location of the appointment with cdf
FX . Then, departing t ∗ minutes before the appointment time will incur the minimum
cost, where FX (t ∗ ) = k+c k
.
Exercise 3.15 P (X ≤ α) = 13 u(α) + 23 u(α − π). P (2 ≤ X < 4) = 23 .
P (X ≤ 0) = 13 .
Exercise 3.16 P (U > 0) = 21 . P |U | < 13 = 13 . P |U | ≥ 43 = 41 .
1
P 3 < U < 21 = 12 1
.
Exercise 3.17 P(A) = P(A ∪ B) = 32 1
. P(B) = P(A ∩ B) = 1024 1
.
P (B c ) = 1024 .
1023
⎪
⎪ 0, y ≤ −1,
⎨4 −1
− 4
cos y, −1 ≤ y ≤ 0,
(3) With 0 ≤ cos−1 y ≤ π, FY (y) = 3 3π −1
⎪ 1 − 3π cos y, 0 ≤ y ≤ 1,
⎪
2
⎩
1, y ≥ 1.
1
∞
Exercise 3.29 (1) f Y (y) = 1+y 2 f X (xi ). (2) f Y (y) = π 1+y
1
.
i=1
( 2)
(3) f Y (y) = π 1+y
1
.
( 2)
Exercise 3.30 When X ∼ U [0,1),
Y = − λ1 ln(1 − X ) ∼ FY (y) = 1 − e−λy u(y).
Exercise 3.31 expected value: E{X } = 3.5. mode: 1, 2, . . . , or 6.
median: any real number in the interval [3, 4].
Exercise 3.32 c = 5 < 101 7
= b.
Exercise 3.33 E{X } = 0. Var{X } = λ22 .
Exercise 3.34 E{X } = α+β α
. Var{X } = (α+β)2αβ (α+β+1)
.
Exercise 3.36 f Y (y) = u(y + 2) − u(y + 1). f Z (z) = 21 {u(z + 4) − u(z + 2)}.
f W (w) = 21 {u(w + 3) − u(w + 1)}.
Exercise 3.37 pY (k) = 41 , k = 3, 4, 5, 6. p Z (k) = 41 , k = −1, 0, 1, 2.
pW (r ) = 41 , r = ± 13 , 0, 15 .
c
− c − 0 FX (x)d x, c ≥ 0,
Exercise 3.39 (2) E X c =
c, c < 0.
∞ c
+ {1 − FX (x)}d x + 0 FX (x)d x, c ≥ 0,
(3) E X c = 0∞
0 {1 − FX (x)}d x, c < 0.
Exercise 3.40 (1) E{X } = λμ. Var{X } = (1 + λ)λμ.
Exercise 3.41 f X (x) = 4πρx 2 exp − 43 πρx 3 u(x).
Exercise 3.42 A = 16 1
. P(X ≤ 6) = 78 .
Exercise 3.44 E{F(X )} = 21 .
Exercise 3.45 M(t) = t+1 1
. ϕ(ω) = 1+1jω . m 1 = −1. m 2 = 2. m 3 = −6. m 4 = 24.
π t
Exercise 3.46 mgf M(t) = π2 02 (tan x) π d x.
Exercise 3.47 α = 2n−1 B̃|β|n , n .
(2 2)
468 Answers to Selected Exercises
√
Exercise 3.48 M R (t) = 1 + 2πσt exp σ 2t Φ (σt), where Φ is the standard nor-
2 2
mal cdf.
1 − 4y , 0 ≤ y < 1; 21 − 4y , 1 ≤ y < 2;
Exercise 3.51 f Y (y) =
0, otherwise.
Exercise 3.52 A cdf such that n
∞
1 i 1 n+1
(locaton of jump, height of jump) = a + (b − a) 2
, 2
i=1 n=0
and the interval between
adjacent jumps are all the same.
Exercise 3.53⎧ (1) a ≥ 0, a + 13 ≤ b ≤ −3a + 1 .
1√
⎨ 0, √ x < 0; 4 √
x, 0 ≤ x < 1;
(2) FY (x) = 24 1
11 x − 1 , 1 ≤ x < 4; 18 x + 5 , 4 ≤ x < 9;
⎩
1, x ≥ 9.
P(Y = 1) = 6 . P(Y = 4) = 0.
1
5√
Exercise 3.54 FY (x) = 1 √
0, x < 0; 8
x, 0 ≤ x < 1;
⎧8 x + 4 , 1 ≤ x < 16; 1, x ≥ 16.
⎪
⎪ F X (α), y < −2 or y > 2,
⎨
FX (−2) + p X (1), −2 ≤ y < 0,
Exercise 3.57 FY (y) =
⎪
⎪ FX (−2) + p X (0) + p X (1), 0 ≤ y < 2,
⎩
FX (2), y = 2,
where α is the only real root of y = x 3 − 3x when y > 2 or y < −2.
Exercise 3.58 FY (x) = 0 for x < 0, x for 0 ≤ x < 1, and 1 for x ≥ 1.
Exercise 3.59 (1) For α(θ) = 21 , ϕ(ω) = exp − ω4 .
2
Exercise 4.8 ⎧ √ √
⎪ √1 , 0 < y1 < 1 , − y1 + 1 < y2 < y1 + 21 ,
⎪
⎪
⎨ √1 , 0 < y < 1 , √ y − 1 < y < −√ y + 1 ,
2 y1 4 2
1 1 2 1
f Y (y1 , y2 ) = y1 4
√ 2 √ 2
⎪
⎪ √1 , 0 < y1 < 1 , − y1 − 1 < y2 < y − 1
,
⎪
⎩ 2 y1 4 2 1 2
0, otherwise.
f Y1 (y) = √1y u(y)u 41 − y . f Y2 (y) = (1 − |y|)u(1 − |y|).
⎧
⎪
⎪ 0, w < 0,
⎪
⎨ π w2 , 0 ≤ w < 1,
Exercise 4.9 FW (w) =
4
π
√
−1 w 2 −1
√ √
⎪
⎪ − sin w
w + w − 1, 1 ≤ w < 2,
2 2
⎪
⎩
4
√
⎧π 1, w ≥ 2.
⎪ 2 w,
⎨ 0 ≤ w < 1,
π
√
−1 w 2 −1
√
f W (w) = 2 4 − sin w
w, 1 ≤ w < 2,
⎪
⎩
0, otherwise.
⎧
⎨ 1 − e−(μ+λ)w , if w ≥ 0, v ≥ 1,
μ
Exercise 4.10 FW,V (w, v) = μ+λ 1 − e−(μ+λ)w , if w ≥ 0, 0 ≤ v < 1,
⎩
0, otherwise.
2
Exercise 4.11 fU (v) = (1+v)4 u(v).
3v
) √ √
Exercise 4.12 (1) f Y y1 , y2 = 2u(y √ 1 u 1 − y1 u y2 −
y1
y1 u 1 − y2 + y1 .
470 Answers to Selected Exercises
⎧
⎨ y, 0 < y ≤ 1,
f Y1 (y) = 2√1 y u(y)u(1 − y). f Y2 (y) = 2 − y, 1 < y ≤ 2,
⎩
0, otherwise.
√ √
(2) f Y (y1 , y2 ) = √1y1 u (y1 ) u 1 − y2 + y1 u y2 − 2 y1 .
⎧
1 ⎨ y2 , 0 < y2 ≤ 1,
√ − 1, 0 < y1 ≤ 1,
f Y1 (y1 ) = y1
f Y2 (y2 ) = 2 − y2 , 1 < y2 ≤ 2,
0, otherwise. ⎩
0, otherwise.
Exercise 4.13 (1) f Y (y1 , y2 ) = √ y11+y2 u (y1 + y2 ) u (1 − y1 − y2 ) u (y1 − y2 ) u
(1 − y1 + y2 ).√ √
2 2y1 , 0 < y1 ≤ 21 ; 2 1 − 2y1 − 1 , 21 < y1 ≤ 1;
f Y1 (y1 ) =
0, otherwise.
√ √
2 2y2 + 1, − 21 < y2 ≤ 0; 2 1 − 2y2 , 0 < y2 ≤ 21 ;
f Y2 (y2 ) =
0, otherwise.
√
(2) f Y (y1 , y2 ) = √ y12+y2 u (y1 + y2 ) u (1 − y1 + y2 ) u y1 − y2 − y1 + y2 .
⎧ √
⎨ 2 √8y1 + 1 − 1 , √ 0 < y1 ≤ 21 ,
f Y1 (y1 ) = 2 8y1 + 1 − 1 − 4 2y1 − 1, 21 < y1 ≤ 1,
⎩
⎧ 0,√ otherwise.
⎨ 4 √2y2 + 1, √ − 1
2
< y2 ≤ − 18 ,
f Y2 (y2 ) = 4 2y2 + 1 − 8y2 + 1 , − 8 < y2 ≤ 0, 1
⎩
0, otherwise.
z α1 +α2 −1
Exercise 4.14 f Z (z) = β α1 +α2 Γ (α1 +α2 )
exp − βz u(z).
Γ (α1 +α2 ) w α1 −1
f W (w) = Γ (α1 )Γ (α2 ) (1+w)α1 +α2
u(w).
y 2r1 − r −1 1 − 21
Exercise 4.15 (1) f Y1 (y1 ) = 2r1 u (y1 ) 1 1 y1 r y1r − y22
−y12r
1 1
fX y1r − y22 , y2 + f X − y1r − y22 , y2 dy2 .
⎧
⎨ 1, w ≥ 1,
2w, w ∈ [0, 1],
(3) For r = 21 , FW (w) = w 2 , w ∈ [0, 1], f W (w) =
⎩ 0, otherwise.
⎧ 0, otherwise.
⎨ 1, w ≥ 1,
1, w ∈ [0, 1],
For r = 1, FW (w) = w, w ∈ [0, 1], f W (w) =
⎩ 0, otherwise.
0, otherwise.
0, w < 1, 0, w < 1,
For r = −1, FW (w) = f (w) =
1 − w −1 , w ≥ 1. W
w −2 , w > 1.
Exercise
⎧1 4.16 (1) f Y1 ,Y2 (y1 , y2 )
⎪ 2 (y1 − |y2 |) ,
⎪ (y1 , y2 ) ∈ (1 : 3) ∪ (2 : 3),
⎪
⎪
⎨ 1 − |y2 | , (y1 , y2 ) ∈ (1 : 2) ∪ (2 : 1),
= 21 (3 − y1 − |y2 |) , (y1 , y2 ) ∈ (3 : 2) ∪ (3 : 1),
⎪
⎪
⎪ 21 ,
⎪ (y1 , y2 ) ∈ (3 : 3),
⎩
0, otherwise.
(refer to Fig. A.1).
(2) f Y2 (y) = (1 − |y|)u (1 − |y|).
Answers to Selected Exercises 471
(1 : 3) (3 : 2)
1 2 3
(3 : 3)
0 y1
(2 : 3) (3 : 1)
(2 : 1)
−1
⎧1 2
⎪
⎪ y , 0 ≤ y ≤ 1,
⎨2 2
−y + 3y − 23 , 1 ≤ y ≤ 2,
(3) f Y1 (y) =
⎪ 2 (3 − y) ,
⎪ 2 ≤ y ≤ 3,
1 2
⎩
0, y ≤ 1, y ≥ 3.
Exercise
⎧ 4.24 FX,Y |A (x, y) =
⎪
⎪ 1, region 1−3,
⎪
⎪
⎪
⎪ 1 − πa1 2 −xψ(x) + a 2 θx − π2 a 2 , region 1−2,
⎪
⎪
⎪
⎪ 1 − πa1 2 −yψ(y) + a 2 θ y − π2 a 2 , region 1−4,
⎪
⎪
⎪
⎪ 1 − πa1 2 a 2 cos−1 ay − yψ(y)
⎪
⎪
⎪
⎪
⎨ −xψ(x) + a 2 θx − π2 a 2 , region 1−5,
⎪
⎪ 1
xy − a2
θ + y
ψ(y)
⎪
⎪ πa 2 2 y 2
⎪
⎪
⎪
⎪ + x2 ψ(x) − a2
cos−1 x
+ πa 2 , region 1−1 or 2−1,
⎪
⎪ 2 a
⎪
⎪
⎪
⎪ 1
xy − a2
θx + x2 ψ(x)
⎪
⎪ πa 2 2
⎪
⎪
⎩ + 2y ψ(y) − a2
cos−1 y
+ πa 2 , region 4−1 or 1−1,
⎧ 2 a
⎪
⎪ 1
xy − a2
cos−1 y
+ 2y ψ(y)
⎪
⎪ πa 2 2 a
⎪
⎪
⎪
⎪ + x2 ψ(x) + a2
θ , region 2−1 or 3−1,
⎪
⎪ 2 x
⎪
⎪
⎨ 12 a2 −1 x
x y − 2 cos a + x2 ψ(x)
πa
⎪
⎪
2
+ 2y ψ(y) + a2 θ y , region 3−1 or 4−1,
⎪
⎪
⎪
⎪ π 2
xψ(x) + a 2 θx − π2 a 2 ,
1
⎪
⎪ πa 2
2
region 2−2,
⎪
⎪
⎪
⎪
1
yψ(y) + a θ y − 2 a , region 4−2,
⎩ πa 2
0, otherwise.
√
Here, ψ(t) = a 2 − t 2 , θw = cos−1 −ψ(w) , and ‘region’ is shown in Fig. A.2.
a
f X,Y |A (x, y) = πa1 2 u a 2 − x 2 − y 2 .
−a a u
3−1 4−1 4−2
3−2 −a 4−3
Answers to Selected Exercises 473
Exercise 4.28
⎛ A1 linear transformation
⎞ transforming X into an uncorrelated random
√ − √1 0
⎜ 2 2
⎟
vector: A = ⎝ √16 √16 − √26 ⎠.
√1 √1 √1
3 3 3
a linear transformation
⎛ 1 transforming
⎞ X into an uncorrelated random vector with unit
√ − √1 0
⎜ 2 2
⎟
variance: ⎝ √16 √16 − √26 ⎠.
1
√ 1
√ 1
√
2 3 2 3 2 3
Exercise 4.29 f Y (y) = exp(−y)u(y).
Exercise 4.30 p X (1) = 58 , p X (2) = 38 . pY (1) = 43 , pY (2) = 41 .
Exercise 4.31 pmf of M = max (X 1 ,X 2 ):
−2λ λm
m
λk
− λm! ũ(m).
m
P(M = m) = e m!
2 k!
k=0
pmf of N = min (X 1 ,X 2 ):
∞
P(N = n) = e−2λ λn! 2 λk λn
n
k!
+ n!
ũ(n).
k=n+1
Exercise 4.32 f W (w) = 21 {u(w) − u(w − 2)}. fU (v) = u(v + 1) − u(v).
0, z ≤ −1 or z > 2; z+1 , −1 < z ≤ 0;
f Z (z) = 1 2
, 0 < z ≤ 1; 2−z
, 1 < z ≤ 2.
2
⎧ 1
3 2
2
⎪
⎪ t + , − 23 ≤ t < − 21 ,
⎨ 23 2
− t 2, − 21 ≤ t < 21 ,
Exercise 4.33 f Y (t) = 41 E Y 4 = 8013
.
⎪
⎪ t − 3 2 1
, ≤ t < 3
,
⎩2 2 2 2
0, t > 23 or t < − 23 .
Exercise 4.34 f Y ( y) = f X 1 (y1 ) f X 2 (y2 − y1 ) · · · f X n (yn − yn−1 ), where Y = (Y1 ,
Y2 , . . . , Yn ) and y = (y1 , y2 , . . . , yn ).
Exercise 4.35 (1) p X (x) = 2x+5 16
, x = 1, 2. pY (y) = 3+2y 32
, y = 1, 2, 3, 4.
(2) P(X > Y ) = 32 . P(Y = 2X ) = 32
3 9
. P(X + Y = 3) = 16 3
.
P(X ≤ 3 − Y ) = 4 . (3) not independent.
1
0, y ≤ 0; 1 − e−y , 0 < y < 1;
Exercise 4.36 f Y (y) = −y
(e − 1)e , y ≥ 1.
Exercise 4.37 (1) MY (t) = exp 7 et − 1 . (2) Poisson distribution P(7).
Exercise 4.38 k = 23 . f Z |X,Y (z|x, y) = x+y+ x+y+z
1 , 0 ≤ x, y, z ≤ 1.
2
Exercise 4.39 E{exp(−Λ)|X = 1} = 49 .
Exercise 4.40 f X,Y,Z (x, y, z) = fU (x) fU2 (y − x) fU3 (z − y).
1
Exercise 4.41 (1) f X,Y (x, y) = 2π 1 − x 2 − y 2 u 1 − x 2 − y 2 .
3
f X (x) = 43 1 − x 2 u(1 − |x|).
(2) f X,Y |Z (x, y|z) = π 1−z
1
2 u 1 − x − y − z
2 2 2
.
( )
not independent of each other. x+r
2 , −r ≤ x ≤ 0,
Exercise 4.42 (1) c = 2r12 . f X (x) = r r−x
r2
, 0 ≤ x ≤ r.
(2) not independent of each other. (3) f Z (z) = 2z r2
u(z)u(r − z).
474 Answers to Selected Exercises
√
Exercise 4.44 (1) c = π8 . f X (x) = f Y (x) = 3π 8
1 + 2x 2 1 − x 2 u(x)u(1 − x).
X and Y are not independent of each other.
(2) f R,θ (r, θ) = π8 r 3 , 0 ≤ r < 1, 0 ≤ θ < π2 . (3) p Q (q) = 18 , q = 1, 2, . . . , 8.
Exercise 4.45 probability that the battery with pdf of lifetime f lasts longer than
μ
that with g= λ+μ . When λ = μ, the probability is 21 .
Exercise 4.46 (1) fU (x) = xe−x u(x). f V (x) = 21 e−|x| .
∞ g
f X Y (g) = 0 e−x e− x x1 d xu(g). f YX (w) = (1+w) u(w)
2.
= 21 2 − (1 − y)2 , x ≥ 1, 0 ≤ y ≤ 1,
⎪
⎪
⎪
⎪
1
2 − (1 − x) 2
− (1 − y) 2
, x ≤ 1, y ≤ 1, y ≥ −x + 1,
⎪
⎪
2
⎪
⎪
1
(x
4
+ y + 1) 2
, x ≤ 0, y ≤ 0, y ≥ −x − 1,
⎪
⎪
⎪
⎪
1
4
(x + y + 1) 2
− 2x 2
, x ≥ 0, y ≤ 0, y ≥ x − 1,
⎪
⎪
⎪
⎪
1
4
(x + y + 1) 2
− 2y 2
, x ≤ 0, y ≥ 0, y ≤ x + 1,
⎪
⎪
⎪
⎪
1
(x +
y + 1)
2
⎩4
−2 x 2 + y 2 , x ≥ 0, y ≥ 0, y ≤ −x + 1.
f X,Y |B (x, y) = 2 u (1 − |x| − |y|).
1
Exercise 5.1 (3) The vector (X, Y ) is not a bi-variate normal random vector.
(4) The random variables X and Y are not independent of each other.
exp{− 21 (x 2 +y 2 )} 2
Exercise 5.2 f 2 (x, y) =
X 1 ,X 2 |X 1 +X 2 <a 2
2 u a − x 2 − y2 .
2
2π 1−exp − a2
u(t) u v + π2 − u v − π2 .
2
Exercise 5.3 (1) fU,V (t, v) = t
π
exp − t2
exp − 2t 2+v u(v).
2
(2) fU,V (t, v) = √1
π 2v
2
Exercise 5.4 f Y |X (y|x) = √13π exp − 13 y − 4 − x−3
√
2
.
2
f X |Y (x|y) = 3π
2
exp − 23 x − 3 − 2y−4
√
2
.
Answers to Selected Exercises 477
Exercise 5.5 ρ Z W =
(σ22 −σ12 ) cos θ sin θ .
( ) cos2 θ sin2 θ+σ12 σ22 (cos2 θ−sin2 θ)2
σ12 +σ22
2
Exercise 5.10 f C (r, θ) = r f X (r cos θ, r sin θ) u (r ) u(π − |θ|). The random vector
C is an independent
random
vector when X is an i.i.d. random vector with marginal
distribution N 0, σ 2 .
Exercise 5.12 The conditional distribution of X 3 when X 1 = X 2 = 1 is N (1, 2).
Exercise 5.13 acσ 2X + (bc + ad)ρσX σY + bdσY2 = 0.
2 2 5.14 E{X Y 2}= 2ρσ2X σY . E X Y = 0. E X Y = 3ρσ X σY .
2 3 3
Exercise
E X Y = 1 + 2ρ σ X σY .
Exercise
5.16
E Z 2 W 2 = 1 + 2ρ2 σ12 σ22 + m 22 σ12 + m 1 σ2 + 2m 1 m4 2 2+ 4m 1 m 2 ρσ1 σ2 .
2 2 2 2
(2)
Exercise 5.36 E{H } = m(n−2) n(m+δ)
for n > 2.
2n 2 {(m+δ)2 +(n−2)(m+2δ)}
Var{H } = m 2 (n−4)(n−2)2
for n > 4.
μ4 3(n−1)μ22
Exercise 5.40 μ4 X n = n 3 + n 3 .
Exercise 5.42 (1) pdf: f V (v) = e−v u(v). cdf: FV (v) = 1 − e−v u(v).
ρ
2β 2 ρ sin−1
Exercise 5.43 RY = π
sin−11+α 2 . ρ Y = sin−1
1+α2
1 .
1+α2
ρY |α2 =1 = 6
π
sin−1 ρ2 . lim ρY = π sin ρ.
2 −1
α2 →0
478 Answers to Selected Exercises
Exercise 6.1 {X n (ω)} converges almost surely, but not surely, to X (ω).
Exercise 6.2 {X n (ω)} converges almost surely,but not surely,
to X (ω).
n
Exercise 6.3 (1) Sn ∼ NB(n, α). (2) Sn ∼ NB ri , α .
n n i=1 n
n
(3) Sn ∼ P λi . (4) Sn ∼ G αi , β . (5) Sn ∼ C μi , θi .
i=1 i=1 i=1 i=1
p
Exercise 6.4 Snn → m.
Exercise 6.7 The weak law of large numbers holds true.
Exercise 6.10 P (50 < Sn ≤ 80) = P (50.5 < Sn < 80.5) ≈ 0.9348.
Exercise 6.11 E{U } = p(1 − p) + np 2 . E{V } = np 2 .
Exercise 6.12 mgf of Y MY (t) = 2−e1
2t . expected value= 2. variance= 8.
2σ 2
.
2πσ
(4) X̂ n is mean square convergent to X .
Exercise 6.26 an = nσ 2 . bn = nμ4 .
Exercise 6.27 α ≤ 0.
Exercise 6.29 Exact value: P(S ≥ 3) = 1 − 5e−2 ≈ 0.3233.
approximate value: P(S ≥ 3) = P S−2 √ ≥ √1
2 2
= P Z ≥ √12
≈ 0.2398. or P(S ≥ 3) = P(S > 2.5) = P Z > 2√1 2 ≈ 0.3618 .
Exercise 6.30 lim Fn (x) = F(x), where F is the cdf of U [0, 1).
n→∞
d
dx
lim F n (x) = lim ddx Fn (x) .
n→∞ n→∞
Exercise 6.32 Distribution of points: Poisson with parameter μ p.
mean: μ p = 1.5. variance: μ p = 1.5.
Exercise 6.33 Distribution of the total information sent via the fax:
geometric distribution with expected value αβ 1
= 4 × 105 .
Exercise 6.34 Expected value: E {S N } = pλ 1
. variance: Var {S N } = 1
p 2 λ2
.
Answers to Selected Exercises 479
© The Editor(s) (if applicable) and The Author(s), under exclusive license 481
to Springer Nature Switzerland AG 2022
I. Song et al., Probability and Random Variables: Theory and Applications,
https://doi.org/10.1007/978-3-030-97679-8
482 Index
moment, 360 I
Generalized Bienayme-Chebyshev inequal- Image, 19
ity, 451 inverse image, 20
Generalized Cauchy distribution, 398 pre-image, 20
moment, 410 Impulse-convergent sequence, 39
Generalized central limit theorem, 396 Impulse function, 33, 36, 137
Generalized function, 36 symbolic derivative, 41
Generalized Gaussian distribution, 397 Impulse sequence, 39
moment, 409 Impulsive distribution, 396
Generalized normal distribution, 397 Incomplete mean, 250, 334, 364
moment, 409 Incomplete moment, 250, 364
Geometric distribution, 127, 226, 252 Increasing singular function, 31
cf, 200 Independence, 123, 350
expected value, 250 mutual, 124
mean, 250 pairwise, 124
skewness, 205 Independent and identically distributed
sum, 455 (i.i.d.), 269
Independent events, 123
variance, 250
a number of independent events, 124
Geometric sequence, 80
Independent random vector, 266
difference, 80
several independent random vectors, 270
Gödel pairing function, 86
two independent random vectors, 269
Greatest lower bound, 55
Index set, 100
Indicator function, 147
In distribution, 226, 422
H Inequality, 108, 448
Hagen-Rothe identity, 69 absolute mean inequality, 449
Half-closed interval, 4 Bienayme-Chebyshev inequality, 451
Half mean, 250, 334, 364 Bonferroni inequality, 110
logistic distribution, 251 Boole inequality, 108, 141
Half moment, 250, 364 Cauchy-Schwarz inequality, 290, 451
Chebyshev inequality, 449
Half-open interval, 4
Chernoff bound, 453
Half-wave rectifier, 242, 335
generalized Bienayme-Chebyshev
Hazard rate function, 226
inequality, 451
Heaviside convergence sequence, 33
Hölder inequality, 454
Heaviside function, 33
Jensen inequality, 450
Heaviside sequence, 33 Kolmogorov inequality, 452
Heavy-tailed distribution, 396 Lipschitz inequality, 26
Heine-Cantor theorem, 27 Lyapunov inequality, 423, 450
Heredity, 99 Markov inequality, 425, 449
Hermitian adjoint, 290 Minkowski inequality, 454
Hermitian conjugate, 290 tail probability inequality, 448
Hermitian matrix, 292 triangle inequality, 454
Hermitian transpose, 290 Infimum, 144
Hölder inequality, 454 Infinite dimensional vector space, 100
Hybrid random vector, 256 Infinitely often (i.o.), 61
Hypergeometric distribution, 173, 327 Infinite set, 3
expected value, 251 Inheritance, 99
mean, 251 Injection, 21
moment, 251 Injective function, 21
variance, 251 In probability, 109
Hypergeometric function, 69, 364 Integral, 79
488 Index