[go: up one dir, main page]

0% found this document useful (0 votes)
24 views506 pages

I.song - Probability and Random Variables

Uploaded by

German Lozada
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views506 pages

I.song - Probability and Random Variables

Uploaded by

German Lozada
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 506

Iickho Song

So Ryoung Park
Seokho Yoon

Probability
and Random
Variables: Theory
and Applications
Probability and Random Variables: Theory
and Applications
Iickho Song · So Ryoung Park · Seokho Yoon

Probability and Random


Variables: Theory
and Applications
Iickho Song So Ryoung Park
School of Electrical Engineering School of Information, Communications,
Korea Advanced Institute of Science and Electronics Engineering
and Technology The Catholic University of Korea
Daejeon, Korea (Republic of) Bucheon, Korea (Republic of)

Seokho Yoon
College of Information and Communication
Engineering
Sungkyunkwan University
Suwon, Korea (Republic of)

ISBN 978-3-030-97678-1 ISBN 978-3-030-97679-8 (eBook)


https://doi.org/10.1007/978-3-030-97679-8

Translation from the Korean language edition: “Theory of Random Variables” by Iickho Song ©
Saengneung 2020. Published by Saengneung. All Rights Reserved.
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Switzerland AG 2022
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To
our kin and academic ancestors and families
and
to
all those who appreciate and enjoy
the beauty of thinking and learning

To
Professors
Souguil J. M. Ann,
Myung Soo Cha,
Saleem A. Kassam, and
Jordan M. Stoyanov
for their invisible yet enlightening guidance
Preface

This book is a translated version, with some revisions, from Theory of Random
Variables, originally written in Korean by the first author in the year 2020. This book
is intended primarily for those who try to advance one step further beyond the basic
level of knowledge and experience on probability and random variables. At the same
time, this book would also be a good resource for experienced scholars to review and
refine familiar concepts. For these purposes, the authors have included definitions
of basic concepts in clear terms, key advanced concepts in mathematics, and diverse
concepts and notions of probability and random variables with a significant number
of examples and exercise problems.
The organization of this book is as follows: Chap. 1 describes the theory of sets and
functions. The unit step function and impulse function, to be used frequently in the
following chapters, are also discussed in detail, and the gamma function and binomial
coefficients in the complex domain are introduced. In Chap. 2, the concept of sigma
algebra is discussed, which is the key for defining probability logically. The notions
of probability and conditional probability are then discussed, and several classes of
widely used discrete and continuous probability spaces are introduced. In addition,
important notions of probability mass function and probability density function are
described. After discussing another important notion of cumulative distribution func-
tion, Chap. 3 is devoted to the discussion on the notions of random variables and
moments, and also for the discussion on the transformations of random variables.
In Chap. 4, the concept of random variables is generalized into random vectors,
also referred to as joint random variables. Transformations of random vectors are
discussed in detail. The discussion on the applications of the unit step function and
impulse function in random vectors in this chapter is a unique trait of this book.
Chapter 5 focuses on the discussion of normal random variables and normal random
vectors. The explicit formula of joint moments of normal random vectors, another
uniqueness of this book, is delineated in detail. Three statistics from normal samples
and three classes of impulsive distributions are also described in this chapter. In
Chap. 6, the authors briefly describe the fundamental aspects of the convergence of
random variables. The central limit theorem, one of the most powerful and useful
results with practical applications, is among the key expositions in this chapter.

vii
viii Preface

The uniqueness of this book includes, but is not limited to, interesting applications
of impulse functions to random vectors, exposition of the general formula for the
product moments of normal random vectors, discussion on gamma functions and
binomial coefficients in the complex space, detailed procedures to the final answers
for almost all results presented, and a substantially useful and extensive index for
finding subjects more easily. A total of more than 320 exercise problems are included,
of which a complete solution manual for all the problems is available from the authors
through the publisher.
The authors feel sincerely thankful that, as is needed for the publication of any
book, the publication of this book became a reality thanks to a huge amount of
help from many people to the authors in a variety of ways. Unfortunately, the
authors could mention only some of them explicitly: to the anonymous reviewers
for constructive and helpful comments and suggestions, to Bok-Lak Choi and
Seung-Ki Kim at Saengneung for allowing the use of the original Korean title,
to Eva Hiarapi and Yogesh Padmanaban at Springer Nature for extensive editorial
assistance, and to Amelia Youngwha Song Pegram and Yeonwha Song Wratil for
improving the readability. In addition, the research grant 2018R1A2A1A05023192
from Korea Research Foundation was an essential support in successfully completing
the preparation of this book.
The authors would feel rewarded if everyone who spends time and effort wisely in
reading and understanding the contents of this book enjoys the pleasure of learning
and advancing one step further.
Thank you!

Daejeon, Korea (Republic of) Iickho Song


Bucheon, Korea (Republic of) So Ryoung Park
Suwon, Korea (Republic of) Seokho Yoon
January 2022
Contents

1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Set Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.3 Laws of Set Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.1.4 Uncountable Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.2.1 One-to-One Correspondence . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.2.2 Metric Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.3 Continuity of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.3.1 Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.3.2 Discontinuities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.3.3 Absolutely Continuous Functions and Singular
Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.4 Step, Impulse, and Gamma Functions . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.4.1 Step Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.4.2 Impulse Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.4.3 Gamma Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
1.5 Limits of Sequences of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
1.5.1 Upper and Lower Limits of Sequences . . . . . . . . . . . . . . . . . . 55
1.5.2 Limit of Monotone Sequence of Sets . . . . . . . . . . . . . . . . . . . . 57
1.5.3 Limit of General Sequence of Sets . . . . . . . . . . . . . . . . . . . . . . 59
Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
2 Fundamentals of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
2.1 Algebra and Sigma Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
2.1.1 Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
2.1.2 Sigma Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
2.2 Probability Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

ix
x Contents

2.2.1 Sample Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99


2.2.2 Event Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
2.2.3 Probability Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
2.3 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
2.3.1 Properties of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
2.3.2 Other Definitions of Probability . . . . . . . . . . . . . . . . . . . . . . . . 112
2.4 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
2.4.1 Total Probability and Bayes’ Theorems . . . . . . . . . . . . . . . . . . 117
2.4.2 Independent Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
2.5 Classes of Probability Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
2.5.1 Discrete Probability Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
2.5.2 Continuous Probability Spaces . . . . . . . . . . . . . . . . . . . . . . . . . 130
2.5.3 Mixed Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
3 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
3.1 Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
3.1.1 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
3.1.2 Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . 163
3.1.3 Probability Density Function and Probability Mass
Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
3.2 Functions of Random Variables and Their Distributions . . . . . . . . . . 174
3.2.1 Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . 174
3.2.2 Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . 178
3.3 Expected Values and Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
3.3.1 Expected Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
3.3.2 Expected Values of Functions of Random Variables . . . . . . . 191
3.3.3 Moments and Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
3.3.4 Characteristic and Moment Generating Functions . . . . . . . . . 198
3.3.5 Moment Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
3.4 Conditional Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
3.4.1 Conditional Probability Functions . . . . . . . . . . . . . . . . . . . . . . 208
3.4.2 Expected Values Conditional on Event . . . . . . . . . . . . . . . . . . 214
3.4.3 Evaluation of Expected Values via Conditioning . . . . . . . . . . 215
3.5 Classes of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
3.5.1 Normal Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
3.5.2 Binomial Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
3.5.3 Poisson Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
3.5.4 Exponential Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . 224
Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
Contents xi

4 Random Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255


4.1 Distributions of Random Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
4.1.1 Random Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
4.1.2 Bi-variate Random Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
4.1.3 Independent Random Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . 266
4.2 Distributions of Functions of Random Vectors . . . . . . . . . . . . . . . . . . 270
4.2.1 Joint Probability Density Function . . . . . . . . . . . . . . . . . . . . . . 271
4.2.2 Joint Probability Density Function: Method
of Auxiliary Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
4.2.3 Joint Cumulative Distribution Function . . . . . . . . . . . . . . . . . . 280
4.2.4 Functions of Discrete Random Vectors . . . . . . . . . . . . . . . . . . 284
4.3 Expected Values and Joint Moments . . . . . . . . . . . . . . . . . . . . . . . . . . 286
4.3.1 Expected Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
4.3.2 Joint Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
4.3.3 Joint Characteristic Function and Joint Moment
Generating Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
4.4 Conditional Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
4.4.1 Conditional Probability Functions . . . . . . . . . . . . . . . . . . . . . . 299
4.4.2 Conditional Expected Values . . . . . . . . . . . . . . . . . . . . . . . . . . 306
4.4.3 Evaluation of Expected Values via Conditioning . . . . . . . . . . 308
4.5 Impulse Functions and Random Vectors . . . . . . . . . . . . . . . . . . . . . . . 309
Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
5 Normal Random Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
5.1 Probability Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
5.1.1 Probability Density Function and Characteristic
Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
5.1.2 Bi-variate Normal Random Vectors . . . . . . . . . . . . . . . . . . . . . 339
5.1.3 Tri-variate Normal Random Vectors . . . . . . . . . . . . . . . . . . . . 343
5.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
5.2.1 Distributions of Subvectors and Conditional
Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
5.2.2 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
5.3 Expected Values of Nonlinear Functions . . . . . . . . . . . . . . . . . . . . . . . 356
5.3.1 Examples of Joint Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
5.3.2 Price’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
5.3.3 General Formula for Joint Moments . . . . . . . . . . . . . . . . . . . . 367
5.4 Distributions of Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
5.4.1 Sample Mean and Sample Variance . . . . . . . . . . . . . . . . . . . . . 371
5.4.2 Chi-Square Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
5.4.3 t Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
5.4.4 F Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384
xii Contents

Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
6 Convergence of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
6.1 Types of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
6.1.1 Almost Sure Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
6.1.2 Convergence in the Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
6.1.3 Convergence in Probability and Convergence
in Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420
6.1.4 Relations Among Various Types of Convergence . . . . . . . . . 422
6.2 Laws of Large Numbers and Central Limit Theorem . . . . . . . . . . . . . 425
6.2.1 Sum of Random Variables and Its Distribution . . . . . . . . . . . 426
6.2.2 Laws of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
6.2.3 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460

Answers to Selected Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461


Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
Chapter 1
Preliminaries

Sets and functions are key concepts that play an important role in understanding
probability and random variables. In this chapter, we discuss those concepts that will
be used in later chapters.

1.1 Set Theory

In this section, we introduce and review some concepts and key results in the theory
of sets (Halmos 1950; Kharazishvili 2004; Shiryaev 1996; Sommerville 1958).

1.1.1 Sets

Definition 1.1.1 (abstract space) The collection of all entities is called an abstract
space, a space, or a universal set.

Definition 1.1.2 (element) The smallest unit that comprises an abstract space is
called an element, a point, or a component.

Definition 1.1.3 (set) Given an abstract space, a grouping or collection of elements


of the abstract space is called a set.

An abstract space, often denoted by Ω or S, consists of elements or points, the


smallest entities that we shall discuss. In the strict sense, a set is the collection of
elements that can be clearly defined mathematically. For example, the collection
of ‘people who are taller than 1.5 m’ is a set. On the other hand, the collection of

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 1


I. Song et al., Probability and Random Variables: Theory and Applications,
https://doi.org/10.1007/978-3-030-97679-8_1
2 1 Preliminaries

‘tall people’ is not a set because ‘tall’ is not mathematically clear. Yet, in fuzzy set
theory, such a vague collection is also regarded as a set by adopting the concept of
membership function.
Abstract spaces and sets are often represented with braces { } with all elements
explicitly shown, e.g. {1, 2, 3}; with the property of the elements described, e.g.,
{ω : 10 < ω < 20π}; {ai }; or {ai }i=1
n
.
Example 1.1.1 The result of signal processing in binary digital communication can
be represented by the abstract space Ω = {0, 1}. The collection {A, B, . . . , Z } of
capital letters of the English alphabet and the collection S = {(0, 0, . . . , 0), (0, 0,
. . . , 1), . . . , (1, 1, . . . , 1)} of binary vectors are also abstract spaces. ♦
Example 1.1.2 In the abstract space Ω = {0, 1}, 0 and 1 are elements. The abstract
space of seven-dimensional binary vectors contains 27 = 128 elements. ♦
Example 1.1.3 The set A = {1, 2, 3, 4} can also be depicted as, for example, A =
{ω : ω is a natural number smaller than 5}. ♦
Definition 1.1.4 (point set) A set with a single point is called a point set or a singleton
set.
Example 1.1.4 The sets {0}, {1}, and {2} are point sets. ♦
Consider an abstract space Ω and a set G of elements from Ω. When the element
ω does and does not belong to G, it is denoted by

ω ∈ G (1.1.1)

and ω ∈/ G, respectively. Sometimes ω ∈ G is expressed as G  ω, and ω ∈


/ G as
G  ω.
Example 1.1.5 For the set A = {0, 1}, we have 0 ∈ A and 2 ∈
/ A. ♦
Definition 1.1.5 (subset) If all the elements of a set B belong to another set A, then
the set B is called a subset of A, which is expressed as B ⊆ A or A ⊇ B. When B
is not a subset of A, it is expressed as B  A or A  B.
Example 1.1.6 When A = {0, 1, 2, 3}, B = {0, 1}, and C = {2, 3}, it is clear B ⊆ A
and A ⊇ C. The set A is not a subset of B because some elements of A are not
elements of B. In addition, B  C and B  C. ♦
Example 1.1.7 Any set is a subset of itself. In other words, A ⊆ A for any set A. ♦
Definition 1.1.6 (equality) If all the elements of A belong to B and all the elements
of B belong to A, then A and B are called equal, which is written as A = B.
Example 1.1.8 The set A = {ω : ω is a multiple of 25, larger than 15, and smaller
than 99} is equal to B = {25, 50, 75}, and C = {1, 2, 3} is equal to D = {3, 1, 2}. In
other words, A = B and C = D. ♦
1.1 Set Theory 3

Definition 1.1.7 (proper subset) When B ⊆ A and B = A, the set B is called a


proper subset of A, which is denoted by B ⊂ A or A ⊃ B.

Example 1.1.9 The set B = {0, 1} is a proper set of A = {0, 1, 2, 3}; that is, B ⊂ A.

In some cases, ⊆ and ⊂ are used interchangeably.

Theorem 1.1.1 We have

A = B  A ⊆ B, B ⊆ A. (1.1.2)

In other words, two sets A and B are equal if and only if A ⊆ B and B ⊆ A.

As we can later see in the proof of Theorems 1.1.4 and 1.1.1 is useful especially
for proving the equality of two sets.

Definition 1.1.8 (empty set) A set with no point is called an empty set or a null set,
and is denote by ∅ or { }.

Note that the empty set ∅ = { } is different from the point set {0} composed of one
element 0. One interesting property of the empty set is shown in the theorem below.

Theorem 1.1.2 An empty set is a subset of any set.

Example 1.1.10 For the sets A = {0, 1, 2, 3} and B = {1, 5}, we have ∅ ⊆ A and
{ } ⊆ B. ♦

Definition 1.1.9 (finite set; infinite set) A set with a finite or an infinite number of
elements is called a finite or an infinite set, respectively.

Definition 1.1.10 (set of natural numbers; set of integers; set of real numbers) We
will often denote the sets of natural numbers, integers, and real numbers by

J+ = {1, 2, . . .}, (1.1.3)


J = {. . . , −1, 0, 1, . . .}, (1.1.4)

and

R = {x : x is a real number}, (1.1.5)

respectively.

Example 1.1.11 The set {1, 2, 3} is a finite set and the null set { } = ∅ is also a
finite set. The set {ω : ω is a natural number, 0 < ω < 10} is a finite set and {ω :
ω is a real number, 0 < ω < 10} is an infinite set. ♦
4 1 Preliminaries

Example 1.1.12 The sets J+ , J, and R are infinite sets. ♦

Definition 1.1.11 (interval) An infinite set composed of all the real numbers between
two distinct real numbers is called an interval or an interval set.

Let a < b and a, b ∈ R. Then, the sets {ω : ω ∈ R, a ≤ ω ≤ b}, {ω : ω ∈ R, a <


ω < b}, {ω : ω ∈ R, a ≤ ω < b}, and {ω : ω ∈ R, a < ω ≤ b} are denoted by
[a, b], (a, b), [a, b), and (a, b], respectively. The sets [a, b] and (a, b) are called
closed and open intervals, respectively, and the sets [a, b) and (a, b] are both called
half-open and half-closed intervals.

Example 1.1.13 The set [3, 4] = {ω : ω ∈ R, 3 ≤ ω ≤ 4} is a closed interval and


the set (2, 5) = {ω : ω ∈ R, 2 < ω < 5} is an open interval. The sets (4, 5] = {ω :
ω ∈ R, 4 < ω ≤ 5} and [1, 5) = {ω : ω ∈ R, 1 ≤ ω < 5} are both half-closed inter-
vals and half-open intervals. ♦

Definition 1.1.12 (collection of sets) When all the elements of a ‘set’ are sets, the
‘set’ is called a set of sets, a class of sets, a collection of sets, or a family of sets.

A class, collection, and family of sets are also simply called class, collection,
and family, respectively. A collection with one set is called a singleton collection. In
some cases, a singleton set denotes a singleton collection similarly as a set sometimes
denotes a collection.

Example 1.1.14 When A = {1, 2}, B = {2, 3}, and C = { }, the set D = {A, B, C}
is a collection of sets. The set E = {(1, 2], [3, 4)} is a collection of sets. ♦

Example 1.1.15 Assume the sets A = {1, 2}, B = {2, 3}, C = {4, 5}, and D =
{{1, 2}, {4, 5}, 1, 2, 3}. Then, A ⊆ D, A ∈ D, B ⊆ D, B ∈
/ D, C  D, and C ∈ D.
Here, D is a set but not a collection of sets. ♦

Example 1.1.16 The collection A = {{3}} and B = {{1, 2}} are singleton collec-
tions and C = {{1, 2}, {3}} is not a singleton collection. ♦

Definition 1.1.13 (power set) The class of all the subsets of a set is called the power
set of the set. The power set of Ω is denoted by 2Ω .

Example 1.1.17 The power set of Ω = {3} is 2Ω = {∅, {3}}. The power set of Ω =
{4, 5} is 2Ω = {∅, {4}, {5}, Ω}. For a set with n elements, the power set is a collection
of 2n sets. ♦
1.1 Set Theory 5

Fig. 1.1 Set A and its


complement Ac Ac

1.1.2 Set Operations

Definition 1.1.14 (complement) For an abstract space Ω and its subset A, the com-
plement of A, denoted by Ac or A, is defined by

Ac = {ω : ω ∈
/ A, ω ∈ Ω}. (1.1.6)

Figure 1.1 shows a set and its complement via a Venn diagram.
Example 1.1.18 It is easy to see that Ω c = ∅ and (B c )c = B for any set B. ♦
Example 1.1.19 For the abstract space Ω = {0, 1, 2, 3} and B = {0, 1}, we have
B c = {2, 3}. The complement of the interval1 A = (−∞, 1] is Ac = (1, ∞). ♦
Definition 1.1.15 (union) The union or sum, denoted by A ∪ B or A + B, of two
sets A and B is defined by

A∪B = A+B
= {ω : ω ∈ A or ω ∈ B}. (1.1.7)

That is, A ∪ B denotes the set of elements that belong to at least one of A and B.
Figure 1.2 shows the union of A and B via a Venn diagram. More generally, the
union of {Ai }i=1
n
is2 denoted by

n
∪ Ai = A1 ∪ A 2 ∪ · · · ∪ A n . (1.1.8)
i=1

Example 1.1.20 If A = {1, 2, 3} and B = {0, 1}, then A ∪ B = {0, 1, 2, 3}. ♦


Example 1.1.21 For any two sets A and B, we have B ∪ B = B, B ∪ B c = Ω,
B ∪ Ω = Ω, B ∪ ∅ = B, A ⊆ (A ∪ B), and B ⊆ (A ∪ B). ♦

1 Because an interval assumes the set of real numbers by definition, it is not necessary to specify
the abstract space when we consider an interval.
2 We often use braces also to denote a number of items in a compact way. For example, {A }n
i i=1
here represents A1 , A2 , . . .,An .
6 1 Preliminaries

Fig. 1.2 Sum A ∪ B of A


and B

A B

Fig. 1.3 Intersection A ∩ B A∩B


of A and B

A B

Example 1.1.22 We have A ∪ B = B when A ⊆ B, and (A ∪ B) ⊆ C when A ⊆ C


and B ⊆ C. In addition, for four sets A, B, C, and D, we have (A ∪ B) ⊆ (C ∪ D)
when A ⊆ C and B ⊆ D. ♦

Definition 1.1.16 (intersection) The intersection or product, denoted by A ∩ B or


AB, of two sets A and B is defined by

A ∩ B = {ω : ω ∈ A and ω ∈ B}. (1.1.9)

That is, A ∩ B denotes the set of elements that belong to both A and B simultaneously.

The Venn diagram for the intersection of A and B is shown in Fig. 1.3. Meanwhile,
n
∩ Ai = A1 ∩ A 2 ∩ · · · ∩ A n (1.1.10)
i=1

denotes the intersection of {Ai }i=1


n
.

Example 1.1.23 For A = {1, 2, 3} and B = {0, 1}, we have A ∩ B = AB = {1}.


The intersection of the intervals [1, 3) and (2, 5] is [1, 3) ∩ (2, 5] = (2, 3). ♦

Example 1.1.24 For any two sets A and B, we have B ∩ B = B, B ∩ B c = ∅,


B ∩ Ω = B, B ∩ ∅ = ∅, (A ∩ B) ⊆ A, and (A ∩ B) ⊆ B. We also have A ∩ B = A
when A ⊆ B. ♦

Example 1.1.25 For three sets A, B, and C, we have (A ∩ B) ⊆ (A ∩ C) when


B ⊆ C. ♦
1.1 Set Theory 7

Fig. 1.4 Partition S


{A1 , A2 , . . . , A6 } of S
A2
A3
A1
A5
A4
A6

Definition 1.1.17 (disjoint) If A and B have no element in common, that is, if


A ∩ B = AB = ∅, then the sets A and B are called disjoint or mutually exclusive.

Example 1.1.26 The sets C = {1, 2, 3} and D = {4, 5} are mutually exclusive. The
sets A = {1, 2, 3, 4} and B = {4, 5, 6} are not mutually exclusive because A ∩ B =
{4} = ∅. The intervals [1, 3) and [3, 5] are mutually exclusive, and [3, 4] and [4, 5]
are not mutually exclusive. ♦

Definition 1.1.18 (partition) A collection of subsets of S is called a partition of


S when the subsets in the collection are collectively exhaustive and every pair of
subsets in the collection is disjoint. Specifically, the collection {Ai }i=1
n
is a partition
of S if both
n
(collectively exhaustive) : ∪ Ai = S (1.1.11)
i=1

and

(disjoint) : Ai ∩ A j = ∅ (1.1.12)

for all i = j are satisfied.

The singleton collection {S} composed only of S is not regarded as a partition of


S. Figure 1.4 shows a partition {A1 , A2 , . . . , A6 } of S.

Example 1.1.27 When A = {1, 2}, the collection {{1}, {2}} is a partition of A.
Each of the five collections {{1}, {2}, {3}, {4}}, {{1}, {2}, {3, 4}}, {{1}, {2, 3}, {4}},
{{1, 2}, {3}, {4}}, and {{1, 2}, {3, 4}} is a partition of B = {1, 2, 3, 4} while neither
{{1, 2, 3}, {3, 4}} nor {{1, 2}, {3}} is a partition of B. ♦

Example 1.1.28 The collection {A, ∅} is a partition of A, and {[3, 3.3), [3.3, 3.4],
(3.4, 3.6], (3.6, 4)} is a partition of the interval [3, 4). ♦

Example 1.1.29 For A = {1, 2, 3}, obtain all the partitions without the null set.

Solution Because the number of elements in A is three, a partition of A with non-


empty sets will be that of one- and two-element sets. Thus, collections {{1}, {2, 3}},
{{2}, {1, 3}}, {{3}, {1, 2}}, and {{1}, {2}, {3}} are the desired partitions. ♦
8 1 Preliminaries

Fig. 1.5 Difference A − B


between A and B

A B

Definition 1.1.19 (difference) The difference A − B, also denoted by A \ B, is


defined as

A − B = {ω : ω ∈ A and ω ∈
/ B}. (1.1.13)

Figure 1.5 shows A − B via a Venn diagram. Note that we have

A − B = A ∩ Bc
= A − AB. (1.1.14)

Example 1.1.30 For A = {1, 2, 3} and B = {0, 1}, we have A − B = {2, 3} and
B − A = {0}. The differences between the intervals [1, 3) and (2, 5] are [1, 3) −
(2, 5] = [1, 2] and (2, 5] − [1, 3) = [3, 5]. ♦

Example 1.1.31 For any set A, we have Ω − A = Ac , A − Ω = ∅, A − A = ∅,


A − ∅ = A, A − Ac = A, (A + A) − A = ∅, and A + (A − A) = A. ♦

Definition 1.1.20 (symmetric difference) The symmetric difference, denoted by


AB, between two sets A and B is the set of elements which belong only to A
or only to B.

From the definition of symmetric difference, we have

AB = (A − B) ∪ (B − A)
   
= A ∩ B c ∪ Ac ∩ B
= (A ∪ B) − (A ∩ B) . (1.1.15)

Figure 1.6 shows the symmetric difference AB via a Venn diagram.

Example 1.1.32 For A = {1, 2, 3, 4} and B = {4, 5, 6}, we have AB = {1, 2, 3} ∪
{5, 6} = {1, 2, 3, 4, 5, 6} − {4} = {1, 2, 3, 5, 6}. The symmetric difference between
the intervals [1, 3) and (2, 5] is [1, 3)(2, 5] = ([1, 3) − (2, 5]) ∪ ((2, 5] − [1, 3)) =
[1, 2] ∪ [3, 5]. ♦
1.1 Set Theory 9

Fig. 1.6 Symmetric


difference AB between A
and B

A B

Example 1.1.33 For any set A, we have AA = ∅, A∅ = ∅A = A, AΩ =
ΩA = Ac , and AAc = Ac A = Ω. ♦

Example 1.1.34 It follows that A − B = B − A and AB = BA, and that


AB = A − B when B ⊆ A. ♦

Example 1.1.35 (Sveshnikov 1968) Show that every element of A1 ΔA2 Δ · · · ΔAn
belongs to only an odd number of the sets {Ai }i=1
n
.

Solution Let us prove the result by mathematical induction. When n = 1, it is self-


evident. When n = 2, every element of A1 ΔA2 is an element only of A1 , or only
of A2 , by definition. Next, assume that every element of C = A1 ΔA2 Δ · · · ΔAn
belongs only to an odd number of the sets {Ai }i=1n
. Then, by the definition of Δ,
every element of B = CΔAn+1 belongs only to An+1 or only to C. In other words,
every element of B belongs only to An+1 or only to an odd number of the sets {Ai }i=1
n
,
concluding the proof. ♦

Interestingly, the set operation similar to the addition of numbers is not the union of
sets but rather the symmetric difference (Karatowski and Mostowski 1976): subtrac-
tion is the inverse operation for addition and symmetric difference is its own inverse
operation while no inverse operation exists for the union of sets. More specifically, for
two sets A and B, there exists only one C, which is C = AB, such that AC = B.
This is clear because A(AB) = B and AC = B.

1.1.3 Laws of Set Operations

Theorem 1.1.3 For the operations of union and intersection, the following laws
apply:
1. Commutative law

A∪B = B∪ A (1.1.16)
A∩B = B∩ A (1.1.17)
10 1 Preliminaries

2. Associative law

(A ∪ B) ∪ C = A ∪ (B ∪ C) (1.1.18)
(A ∩ B) ∩ C = A ∩ (B ∩ C) (1.1.19)

3. Distributive law

(A ∪ B) ∩ C = (A ∩ C) ∪ (B ∩ C) (1.1.20)
(A ∩ B) ∪ C = (A ∪ C) ∩ (B ∪ C) (1.1.21)

Note that the associative and distributive laws are for the same and different types
of operations, respectively.
Example 1.1.36 Assume three sets A = {0, 1, 2, 6}, B = {0, 2, 3, 4}, and C =
{0, 1, 3, 5}. It is easy to check the commutative and associative laws. Next, (1.1.20)
holds true as it is clear from (A ∪ B) ∩ C = {0, 1, 2, 3, 4, 6} ∩ {0, 1, 3, 5} = {0, 1, 3}
and (A ∩ C) ∪ (B ∩ C) = {0, 1} ∪ {0, 3} = {0, 1, 3}. In addition, (1.1.21) holds
true as it is clear from (A ∩ B) ∪ C = {0, 2} ∪ {0, 1, 3, 5} = {0, 1, 2, 3, 5} and
(A ∪ C) ∩ (B ∪ C) = {0, 1, 2, 3, 5, 6} ∩ {0, 1, 2, 3, 4, 5} = {0, 1, 2, 3, 5}. ♦
Generalizing (1.1.20) and (1.1.21) for a number of sets, we have
 
n n
B∩ ∪ Ai = ∪ (B ∩ Ai ) (1.1.22)
i=1 i=1

and
 
n n
B∪ ∩ Ai = ∩ (B ∪ Ai ) , (1.1.23)
i=1 i=1

respectively.
Theorem 1.1.4 When A1 , A2 , . . ., An are subsets of an abstract space S, we have
 c  
n n
∪ Ai = S− ∪ Ai
i=1 i=1
n
= ∩ (S − Ai )
i=1
n
= ∩ Aic (1.1.24)
i=1

and
 c
n n
∩ Ai = ∪ Aic . (1.1.25)
i=1 i=1
1.1 Set Theory 11

Proof Let us prove the theorem by using (1.1.2).


 c
n n
(1) Proof of (1.1.24). Let x ∈ ∪ Ai . Then, x ∈ / ∪ Ai , and therefore x is not an
i=1 i=1
element of any Ai . This implies that x is an element of every Aic or, equivalently,
n
x ∈ ∩ Aic . Therefore, we have
i=1

 c
n n
∪ Ai ⊆ ∩ Aic . (1.1.26)
i=1 i=1

n
Next, assume x ∈ ∩ Aic . Then, x ∈ Aic , and therefore x is not an element of Ai
i=1  c
n n
for any i: in other words, x ∈
/ ∪ Ai . This implies x ∈ ∪ Ai . Therefore, we
i=1 i=1
have
 c
n n
∩ Aic ⊆ ∪ Ai . (1.1.27)
i=1 i=1

From (1.1.2), (1.1.26), and (1.1.27), the result (1.1.24) is proved.


(2) Proof
 of(1.1.25). Replacing Ai with Aic in (1.1.24) and using (1.1.26), we have
n  c
c
n n
∪ Aic = ∩ Aic = ∩ Ai . Taking the complement completes the proof.
i=1 i=1 i=1

Example 1.1.37 Consider S = {1, 2, 3, 4} and its subsets A1 = {1}, A2 = {2, 3},
and A3 = {1, 3, 4}. Then, we have (A1 + A2 )c = {1, 2, 3}c = {4}, which is the same
as Ac1 Ac2 = {2, 3, 4} ∩ {1, 4} = {4}. Similarly, we have (A1 A2 )c = ∅c = S, which
is the same as Ac1 + Ac2 = {2, 3, 4} ∪ {1, 4} = S. In addition, (A2 + A3 )c = S c = ∅
is the same as Ac2 Ac3 = {1, 4} ∩ {2} = ∅. Finally, (A1 A3 )c = {1}c = {2, 3, 4} is the
same as Ac1 + Ac3 = {2, 3, 4} ∪ {2} = {2, 3, 4}. ♦

Example 1.1.38 For three sets A, B, and C, show that A = B − C when A ⊆ B


and C = B − A.

Solution First, assume A = B. Then, because C = B − A = ∅, we have B − C =


B − ∅ = A. Next, assume A ⊂ B. Then, C = B ∩ Ac and C c = (B ∩ Ac )c = B c ∪
A from (1.1.14) and (1.1.25), respectively. Thus, using (1.1.14) and (1.1.20), we get
B − C = B ∩ C c = B ∩ (B c ∪ A) = (B ∩ B c ) ∪ (B ∩ A) = ∅ ∪ A = A. ♦
12 1 Preliminaries

1.1.4 Uncountable Sets

Definition 1.1.21 (one-to-one correspondence) A relationship between two sets in


which each element of either set is assigned with only one element of the other is
called a one-to-one correspondence.

The notion of one-to-one correspondence will be redefined in Definition 1.2.9.


Based on the set J+ of natural numbers defined in (1.1.3) and the concept of one-to-
one correspondence, let us define a countable set.

Definition 1.1.22 (countable set) A set is called countable or denumerable if we


can find a one-to-one correspondence between the set and a subset of J+ .

The elements of a countable set can be indexed as a1 , a2 , . . . , an , . . .. It is easy to


see that finite sets are all countable sets.

Example 1.1.39 The sets {1, 2, 3} and {1, 10, 100, 1000} are both countable because
a one-to-one correspondence can be established between these two sets and the sub-
sets {1, 2, 3} and {1, 2, 3, 4}, respectively, of J+ . ♦

Example 1.1.40 The set J of integers is countable because we can establish a one-
to-one correspondence as

0 −1 1 −2 2 · · · −n n · · ·
     ···   ··· (1.1.28)
1 2 3 4 5 · · · 2n 2n + 1 · · ·

between J and J+ . Similarly, it is easy to see that the sets {ω : ω is a positive even
number} and {2, 4, . . . , 2n , . . .} are countable sets by noting the one-to-one corre-
spondences 2n ↔ n and 2n ↔ n, respectively. ♦

Theorem 1.1.5 The set

Q = {q : q is a rational number} (1.1.29)

of rational numbers is countable.


 
p
Proof (Method 1) A rational number can be expressed as q
: q ∈ J+ , p ∈ J .
Consider the sequence


⎨ 1,
0
i = j = 0;
= − j , i = 1, 2, . . . , j = 1, 2, . . . , i;
i− j+1
ai j (1.1.30)

⎩ j−i , i = 1, 2, . . . , j = i + 1, i + 2, . . . , 2i.
2i− j+1

In other words, consider


1.1 Set Theory 13

0
i =0:
1
1 1
i =1:− ,
1 1
(1.1.31)
2 1 1 2
i =2:− ,− , ,
1 2 2 1
3 2 1 1 2 3
i =3:− ,− ,− , , ,
1 2 3 3 2 1
..
.

Reading this sequence downward from the first row and ignoring repetitions, we will
have a one-to-one correspondence between the sets of rational numbers and natural
numbers.
(Method 2) Assume integers x = 0 and y, and denote the rational number xy
by the coordinates (x, y) on a two dimensional plane. Reading the integer coordinates
as (1, 0) → (1, 1) → (−1, 1) → (−1, −1) → (2, −1) → (2, 2) → · · · while
skipping a number if it had previously appeared, we have a one-to-one correspon-
dence between J+ and Q. ♠

Theorem 1.1.6 Countable sets have the following properties:


(1) A subset of a countable set is countable.
(2) There exists a countable subset for any infinite set.
∞ n
(3) If the sets A1 , A2 , . . . are all countable, then ∪ Ai = ∪ Ai = lim ∪ Ai is
i∈J+ i=1 n→∞ i=1
also countable.

Proof (1) For a finite set, it is obvious. For an infinite set, denote a countable set by
A = {a1 , a2 , . . .}. Then, a subset of A can be expressed as B = an 1 , an 2 , . . .
and we can find a one-to-one correspondence i ↔ ani between J+ and B.
(2) We can choose a countable subset {a1 , a2 , . . .} arbitrarily from an infinite set.
(3) Consider the sequence B1 , B2 , . . . defined as

B1 = A1
= b1 j , (1.1.32)
B2 = A2 − A1
= b2 j , (1.1.33)
B3 = A3 − (A1 ∪ A2 )
= b3 j , (1.1.34)
..
.
14 1 Preliminaries

∞ ∞
Clearly, B1 , B2 , . . . are mutually exclusive and ∪ Ai = ∪ Bi . Because Bi ⊆ Ai ,
i=1 i=1
the sets B1 , B2 , . . . are all countable from Property (1). Next, arrange the elements
of B1 , B2 , . . . as

b11 → b12 b13 → b14 ···


   
b21 b22 b23 b24 ···
↓   
(1.1.35)
b31 b32 b33 b34 ···
 
.. .. .. .. ..
. . . . .

and read them in the order as directed by the arrows, which represents a one-to-
∞ ∞
one correspondence between J+ and ∪ Bi = ∪ Ai .
i=1 i=1

Property (3) of Theorem 1.1.6 also implies that a countably infinite union of
countable sets is a countable set.
Example 1.1.41
 (Sveshnikov 1968)  Show that the Cartesian product A1 × A2 ×
· · · × An = a1i1 , a2i2 , . . . , anin of a finite number of countable sets is countable.
Solution It suffices to show that the Cartesian product A × B is countable when
A and B are countable. Denote two countable sets by A = {a1 , a2 , . . .} and
B = {b1 , b2 , . . .}. If we arrange the elements of the Cartesian product A × B as
(a1 , b1 ) , (a1 , b2 ) , (a2 , b1 ) , (a1 , b3 ) , (a2 , b2 ) , (a3 , b1 ) , . . ., then it is apparent that
the Cartesian product is countable. ♦
Example 1.1.42 Show that the set of finite sequences from a countable set is count-
able.
Solution The set Bk of finite sequences with length k from a countable set A is
equivalent to the k-fold Cartesian product Ak = (b1 , b2 , . . . , bk ) : b j ∈ A of A.
Then, Bk is countable from Example 1.1.41. Next, the set of finite sequences is the

countable union ∪ Bk , which is countable from (3) of Theorem 1.1.6. ♦
k=1

Example 1.1.43 The set A = a b : a, b ∈ Q 1 is countable, where Q 1 = Q − {0}


with Q the set of rational numbers. ♦
Example 1.1.44 The set ΥT of infinite binary sequences with a finite period is
countable. First, note that there exist3 two sequences with period 2, six sequences with

3 We assume, for example, · · · 1010 · · · and · · · 0101 · · · are different.


1.1 Set Theory 15

period 3, . . ., at most 2k − 2 sequences with period k, . . .. Based on this observation,


we can find the one-to-one correspondence

· · · 00 · · · → 1, · · · 11 · · · → 2, · · · 0101 · · · → 3,
· · · 1010 · · · → 4, · · · 001001 · · · → 5, · · · 010010 · · · → 6,
· · · 100100 · · · → 7, · · · 011011 · · · → 8, · · · 101101 · · · → 9, · · ·

between ΥT and J+ . ♦

Definition 1.1.23 (uncountable set) When no one-to-one correspondence exists


between a set and a subset of J+ , the set is called uncountable or non-denumerable.

As it has already been mentioned, finite sets are all countable. On the other hand,
some infinite sets are countable and some are uncountable.

Theorem 1.1.7 The interval set [0, 1] = R[0,1] = {x : 0 ≤ x ≤ 1}, i.e., the set of
real numbers in the interval [0, 1], is uncountable.

Proof We prove the theorem by contradiction. Letting ai j ∈ {0, 1, . . . , 9}, the ele-
ments of the set R[0,1] can be expressed as 0.ai1 ai2 · · · ain · · · . Assume R[0,1] is
countable: in other words, assume all the elements of R[0,1] are enumerated as

α1 = 0.a11 a12 · · · a1n · · · (1.1.36)


α2 = 0.a21 a22 · · · a2n · · · (1.1.37)
..
.
αn = 0.an1 an2 · · · ann · · · (1.1.38)
..
.

Now, consider a number β = 0.b1 b2 · · · bn · · · , where bi = aii and bi ∈ {0, 1,


. . . , 9}. Then, it is clear that β ∈ R[0,1] . We also have β = α1 because b1 = a11 ,
β = α2 because b2 = a22 , · · · : in short, β is not equal to any αi . In other words,
although β is an element of R[0,1] , it is not included in the enumeration, and pro-
duces a contradiction to the assumption that all the numbers in R[0,1] have been
enumerated. Therefore, R[0,1] is uncountable. ♠

Example 1.1.45 The set Υ = { = (a1 a2 · · · ) : ai ∈ {0, 1}} of one-sided infinite


binary sequences is uncountable, which can be shown via a method similar to the
proof of Theorem 1.1.7. Specifically, assume Υ is countable, and all the elements i
of Υ are arranged as

1 = (a11 a12 · · · a1n · · · ) (1.1.39)


2 = (a21 a22 · · · a2n · · · ) (1.1.40)
..
.
16 1 Preliminaries

where ai j ∈ {0, 1}. Denote the complement of a binary digit x by x. Then, the
sequence (a11 a22 · · · ) produces a contradiction. Therefore, Υ is uncountable. ♦

Example 1.1.46 (Gelbaum andOlmsted  1964) Consider the closed interval U =


[0, 1]. The open interval D11 = 13 , 23 is removed from U in the first step, two open
   
intervals D = 1 , 2 and D22 = 79 , 89 are removed from the remaining region
 1   2 21  9 9
0, 3 ∪ 3 , 1 in the second step, . . ., 2k−1 open intervals of length 3−k are removed
in the k-th step, . . .. The limit of the remaining region C of this procedure is called
the Cantor set or Cantor ternary set. The procedure can equivalently be described as
follows: Denote an open interval with the starting point

1  2
k−1
ζ1k = k
+ cj (1.1.41)
3 j=0
3j

and ending point

2  2
k−1
ζ2k = k
+ cj (1.1.42)
3 j=0
3j

by A2c0 ,2c1 ,2c2 ,...,2ck−1 , where c0 = 0 and c j = 0 or 1 for j = 1, 2, . . . , k − 1. Then,


at the k-th step in the procedure of obtaining the Cantor set C, we are removing the
2k−1 open intervals A2c0 ,2c1 ,2c2 ,...,2ck−1 of length 3−k . Specifically, we have A0 =
1 2     1 2
, when k = 1; A0,0 = 19 , 29 and A0,2 = 79 , 89 when k = 2; A0,0,0 = 27 , 27 ,
3 3 7 8  19 20   25 26 
A0,0,2 = 27 , 27 , A0,2,0 = 27 , 27 , and A0,2,2 = 27 , 27 when k = 3; · · · . ♦

The Cantor set C described in Example 1.1.46 has the following properties:
∞  
(1) The set C can be expressed as C = ∩ Bi , where B1 = [0, 1], B2 = 0, 13 ∪
2        
i=1 
3
, 1 , B3 = 0, 19 ∪ 29 , 39 ∪ 69 , 79 ∪ 89 , 1 , . . ..
(2) The set C is an uncountable and closed set.
(3) The length of the union of the open intervals removed when obtaining C is
1
1
3
+ 322 + 343 + · · · = 1−3 2 = 1. Consequently, the length of C is 0.
3
(4) The set C is the set of ternary real numbers between 0 and 1 that can be repre-
sented without using 1. In other words, every element of C can be expressed as
∞
xn
3n
, xn ∈ {0, 2}.
n=1

In Sect. 1.3.3, the Cantor set is used as the basis for obtaining a singular function.

Example 1.1.47 (Gelbaum and Olmsted 1964) The Cantor set C considered in
Example 1.1.46 has a length 0. A Cantor set with a length greater than 0 can be
obtained similarly. For example, consider
  [0, 1] and a constant α ∈ (0, 1].
the interval
In the first step, an open interval 21 − α4 , 21 + α4 of length α2 is removed. In the
second step, an open interval each of length α8 is removed at the center of the two
1.1 Set Theory 17

Table 1.1 Finite, infinite, countable, and uncountable sets


Countable (enumerable, Uncountable
denumerable) (non-denumerable)
Finite Finite
Example: {3, 4, 5}
Infinite Countably infinite Uncountably infinite breal
Example: {1, 2, . . .} Example: (0, 1]

α
closed intervals remaining. In the third step, an open interval each of length 32 is
removed at the center of the four closed intervals remaining. . . .. Then, this Cantor
set is a set of length 1 − α because the sum of lengths of the regions removed is
α
2
+ α4 + α8 + · · · = α. A Cantor set with a non-zero length is called the Smith-
Volterra-Cantor set or fat Cantor set. ♦

Example 1.1.48 As shown in Table 1.1, the term countable set denotes a finite set
or a countably infinite set, and the term infinite set denotes a countably infinite set
or an uncountably infinite, simply called uncountable, set. ♦

Definition 1.1.24 (almost everywhere) In real space, when the length of the union
of countably many intervals is arbitrarily small, a set of points that can be contained
in the union is called a set of length 0. In addition, ‘at all points except for a set
of length 0’ is called ‘almost everywhere’, ‘almost always’, ‘almost surely’, ‘with
probability 1’, ‘almost certainly’, or ‘at almost every point’.

In the integer or discrete space (Jones 1982), ‘almost everywhere’ denotes ‘all
points except for a finite set’.

Example 1.1.49 The intervals [1, 2) and (1, 2) are the same almost everywhere.
The sets {1, 2, . . .} and {2, 3, . . .} are the same almost everywhere. ♦

Definition 1.1.25 (equivalence) If we can find a one-to-one correspondence between


two sets M and N , then M and N are called equivalent, or of the same cardinality,
and are denoted by M ∼ N .

Example 1.1.50 For the two sets A = {4, 2, 1, 9} and B = {8, 0, 4, 5}, we have
A ∼ B. ♦

Example 1.1.51 If A ∼ B, then 2 A ∼ 2 B . ♦

Example 1.1.52 (Sveshnikov 1968) Show that A ∪ B ∼ A when A is an infinite


set and B is a countable set.

Solution First, arbitrarily choose a countably infinite set A0 = {a0 , a1 , . . .} from A.


Because A0 ∪ B is also a countably infinite set, we have a one-to-one correspondence
between A0 ∪ B and A0 : denote the one-to-one correspondence by g(x). Then,
18 1 Preliminaries

Fig. 1.7 The 1


 function
 y = tan π x − −
2
y = tan π x − 21
showing the equivalence
(0, 1) ∼ R between the
interval (0, 1) and the set R
of real numbers
1
− 1 x
2


g(x), x ∈ A0 ∪ B,
h(x) = (1.1.43)
x, x∈/ A0 ∪ B

is a one-to-one correspondence between A ∪ B and A: that is, A ∪ B ∼ A. ♦

Example 1.1.53 The set J+ of natural numbers is equivalent to the set J of integers.
The set of irrational numbers is equivalent to the set R of real numbers from Exercise
1.16. ♦

Example 1.1.54 It is interesting to note that the set of irrational numbers is not
closed under certain basic operations, such as addition and multiplication, while the
much smaller set of rational numbers is closed under such operations. ♦

Example 1.1.55 The Cantor and Smith-Volterra-Cantor sets considered in Exam-


ples 1.1.46 and 1.1.47, respectively, are both equivalent to the set R of real numbers.

Example  As shown in Fig. 1.7, it is easy to see that (0, 1) ∼ R via y =


 1.1.56
tan π x − 21 . It is also clear that [a, b] ∼ [c, d] when a < b and c < d because
a point x between a and b, and a point y between c and d, have the one-to-one
(d−c)
correspondence y = (b−a) (x − a) + c. ♦

Example 1.1.57 The set of real numbers R is uncountable. The intervals [a, b],
[a, b), (a, b], and (a, b) are all uncountable for any real number a and b > a from
Theorem 1.1.7 and Example 1.1.56. ♦

Theorem 1.1.8 When A is equivalent to a subset of B and B is equivalent to a


subset of A, then A is equivalent to B.

Example 1.1.58 As we have observed in Theorem 1.1.5, the set J+ of natu-


ral numbers is equivalent to the set Q of rational numbers. Similarly, the subset
Q3 = {t : t = q3 , q ∈ J} of Q is equivalent to the set J of integers. Therefore, recol-
lecting that J+ is a subset of J and Q3 is a subset of Q, Theorem 1.1.8 dictates that
J ∼ Q. ♦
1.1 Set Theory 19

Example 1.1.59 It is interesting to note that, although J+ is a proper subset of J,


which is in turn a proper subset of Q, the three sets J+ , J, and Q are all equivalent.
For finite sets, on the other hand, such an equivalence is impossible when one set
is a proper subset of the other. This exemplifies that the infinite and finite spaces
sometimes produce different results. ♦

1.2 Functions

In this section, we will introduce and briefly review some key concepts within the
theory of functions (Ito 1987; Royden 1989; Stewart 2012).
Definition 1.2.1 (mapping) A relation f that assigns every element of a set Ω with
only one element of another set A is called a function or mapping and is often denoted
by f : Ω → A.
For the function f : Ω → A, the sets Ω and A are called the domain and
codomain, respectively, of f .
Example 1.2.1 Assume the domain Ω = [−1, 1] and the codomain A = [−2, 1].
The relation that connects all the points in [−1, 0) of the domain with −1 in the
codomain, and all the points in [0, 1] of the domain with 1 in the codomain is a
function. ♦
Example 1.2.2 Assume the domain Ω = [−1, 1] and the codomain A = [−2, 1].
The relation that connects all the points in [−1, 0) of the domain with −1 in the
codomain, and all the points in (0, 1] of the domain with 1 in the codomain is not
a function because the point 0 in the domain is not connected with any point in
the codomain. In addition, the relation that connects all the points in [−1, 0] of the
domain with −1 in the codomain, and all the points in [0, 1] of the domain with 1 in
the codomain is not a function because the point 0 in the domain is connected with
more than one point in the codomain. ♦
Definition 1.2.2 (set function) A function whose domain is a collection of sets is
called a set function.
Example 1.2.3 Let the domain be the power set 2C = {∅, {3}, {4}, {5}, {3, 4},
{3, 5}, {4, 5}, {3, 4, 5}} of C = {3, 4, 5}. Define a function f (B) for B ∈ 2C as
the number of elements in B. Then, f is a set function, and we have f ({3}) = 1,
f ({3, 4}) = 2, and f ({3, 4, 5}) = 3, for example. ♦

Definition 1.2.3 (image) For a function f : Ω → A and a subset G of Ω, the set

f (G) = {a : a = f (ω), ω ∈ G}, (1.2.1)

which is a subset of A, is called the image of G (under f ).


20 1 Preliminaries

Fig. 1.8 Image f (G) ⊆ A f :Ω→A


of G ⊆ Ω for a function
f :Ω→ A Ω A
f (G)
G

Fig. 1.9 Range f (Ω) ⊆ A f :Ω→A


of function f : Ω → A
A

Ω f (Ω)

Definition 1.2.4 (range) For a function f : Ω → A, the image f (Ω) is called the
range of the function f .

The image f (G) of G ⊆ Ω and the range f (Ω) are shown in Figs. 1.8 and 1.9,
respectively.

Example 1.2.4 For the domain Ω = [−1, 1] and the codomain  A = [−10,
 10], con-
sider the function f (ω) = ω 2 . The image of the subset G 1 = − 21 , 21 of the domain
Ω is f (G 1 ) = [0, 0.25), and the image of G 2 = (0.1, 0.2) is f (G 2 ) = (0.01, 0.04).

Example 1.2.5 The image of G = {{3}, {3, 4}} in Example 1.2.3 is f (G) = {1, 2}.

Example 1.2.6 Consider the domain Ω = [−1, 1] and codomain A = [−2, 1].
Assume a function f for which all the points in [−1, 0) ⊆ Ω are mapped to
−1 ∈ A and all the points in [0, 1] ⊆ Ω are mapped to 1 ∈ A. Then, the range
f (Ω) = f ([−1, 1]) of f is {−1, 1}, which is different from the codomain A. In
Example 1.2.3, the range of f is {0, 1, 2, 3}. ♦

As we observed in Example 1.2.6, the range and codomain are not necessarily the
same.

Definition 1.2.5 (inverse image) For a function f : Ω → A and a subset H of A,


the subset

f −1 (H ) = {ω : f (ω) ∈ H }, (1.2.2)

shown in Fig. 1.10, of Ω is called the inverse image of H (under f ).


1.2 Functions 21

Fig. 1.10 Inverse image f :Ω→A


f −1 (H ) ⊆ Ω of H ⊆ A for
a function f : Ω → A Ω f −1 (H) A

f −1 (H) H

Example 1.2.7 Consider the function f (ω) = ω 2 with domain Ω = [−1, 1] and
codomain A = [−10, 10]. The inverse image of a subset H1 = (−0.25, 1) of codomain
A is f −1 (H1 ) = (−1, 1), and the inverse image of H2 = (−0.25, 0) is f −1 (H2 ) =
f −1 ((−0.25, 0)) = ∅. ♦

Example 1.2.8 In Example 1.2.3, the inverse image of H = {3} is f −1 (H ) =


{{3, 4, 5}}. ♦

Definition 1.2.6 (surjection) When the range and codomain of a function are the
same, the function is called an onto function, a surjective function, or a surjection.

If the range and codomain of a function are not the same, that is, if the range is a
proper subset of the codomain, then the function is called an into function.

Definition 1.2.7 (injection) When the inverse image for every element of the
codomain of a function has at most one element, i.e., when the inverse image for
every element of the range of a function has only one element, the function is called
an injective function, a one-to-one function, a one-to-one mapping, or an injection.

In Definition 1.2.7, ‘... function has at most one element, i.e., ...’ can be replaced
with ‘... function is a null set, a singleton set, or a singleton collection of sets, i.e., ...’,
and ‘... has only one element, ...’ with ‘... is a singleton set or a singleton collection
of sets, ...’.

Example 1.2.9 For the domain Ω = [−1, 1] and the codomain A = [0, 1], consider
the function f (ω) = ω 2 . Then, f is a surjective function because its range is the same
as the codomain, and f is not an injective function because, for any non-zero point
of the range, the inverse image has two elements. ♦

Example 1.2.10 For the domain Ω = [−1, 1] and the codomain A = [−2, 2], con-
sider the function f (ω) = ω. Then, because the range [−1, 1] is not the same as the
codomain, the function f is not a surjection. Because the inverse image of every
element in the range is a singleton set, the function f is an injection. ♦

Example 1.2.11 For the domain Ω = {{1}, {2, 3}} and the codomain A = {3, {4},
{5, 6, 7}}, consider the function f ({1}) = 3, f ({2, 3}) = {4}. Because the range
{3, {4}} is not the same as the codomain, the function f is not a surjection. Because
22 1 Preliminaries

the inverse image of every element in the range have only one4 element, the function
f is an injection. ♦

1.2.1 One-to-One Correspondence

The notions of one-to-one mapping and one-to-one correspondence defined in Defi-


nitions 1.2.7 and 1.1.21, respectively, can be alternatively defined as in the following
definitions:
Definition 1.2.8 (one-to-one mapping) A mapping is called one-to-one if the inverse
image of every singleton set in the range is a singleton set.
Definition 1.2.9 (one-to-one correspondence) When the inverse image of every ele-
ment in the codomain is a singleton set, the function is called a one-to-one corre-
spondence.
A one-to-one correspondence is also called a bijective, a bijective function, or a
bijective mapping. A bijective function is a surjection and an injection at the same
time. A one-to-one correspondence is a one-to-one mapping for which the range
and codomain are the same. For a one-to-one mapping that is not a one-to-one
correspondence, the range is a proper subset of the codomain.
Example 1.2.12 For the domain Ω = [−1, 1] and the codomain A = [−1, 1], con-
sider the function f (ω) = ω. Then, f is a surjective function because the range is
the same as the codomain, and f is an injective function because, for every point
of the range, the inverse image is a singleton set. In other words, f is a one-to-one
correspondence and a bijective function. ♦
Theorem 1.2.1 When f is a one-to-one correspondence, we have

f (A ∪ B) = f (A) ∪ f (B), (1.2.3)


f (A ∩ B) = f (A) ∩ f (B), (1.2.4)
f −1 (C ∪ D) = f −1 (C) ∪ f −1 (D), (1.2.5)

and

f −1 (C ∩ D) = f −1 (C) ∩ f −1 (D) (1.2.6)

for subsets A and B of the domain and subsets C and D of the range.
Proof Let us show (1.2.5) only. First, when x ∈ f −1 (C ∪ D), we have f (x) ∈ C
or f (x) ∈ D. Then, because x ∈ f −1 (C) or x ∈ f −1 (D), we have x ∈ f −1 (C) ∪

4 The inverse image of the element {4} of the range is not {2, 3} but {{2, 3}}, which has only one
element {2, 3}.
1.2 Functions 23

f −1 (D). Next, when x ∈ f −1 (C) ∪ f −1 (D), we have f (x) ∈ C or f (x) ∈ D. Thus,


we have f (x) ∈ C ∪ D, and it follows that x ∈ f −1 (C ∪ D). ♠

Theorem 1.2.1 implies that, if f is a one-to-one correspondence, not only can


the images f (A ∪ B) and f (A ∩ B) be expressed in terms of the images f (A) and
f (B), but the inverse images f −1 (C ∪ D) and f −1 (C ∩ D) can also be expressed
in terms of the inverse images f −1 (C) and f −1 (D).
Generalizing (1.2.3)–(1.2.6), we have
 
n n
f ∪ Ai = ∪ f (Ai ) , (1.2.7)
i=1 i=1
 
n n
f ∩ Ai = ∩ f (Ai ) , (1.2.8)
i=1 i=1
 
m m
f −1 ∪ Ci = ∪ f −1 (Ci ) , (1.2.9)
i=1 i=1

and
 
m m
−1
f ∩ Ci = ∩ f −1 (Ci ) (1.2.10)
i=1 i=1

if f is a one-to-one correspondence, where {Ai }i=1


n
are subsets of the domain and
{Ci }i=1 are subsets of the range.
m

1.2.2 Metric Space

Definition 1.2.10 (distance function) A function d satisfying the three conditions


below for every three points p, q, and r is called a distance function or a metric.
(1) d( p, q) = d(q, p).
(2) d( p, q) > 0 if p = q and d( p, q) = 0 if p = q.
(3) d( p, q) ≤ d( p, r ) + d(r, q).
Here, d( p, q) is called the distance between p and q.

Example 1.2.13 For two elements a and b in the set R of real numbers, assume
the function d(a, b) = |a − b|. Then, we have |a − b| = |b − a|, |a − b| > 0 when
a = b, and |a − b| = 0 when a = b. We also have |a − c| + |c − b| ≥ |a − b| from
(|α| + |β|)2 − |α + β|2 = 2 (|α||β| − αβ) ≥ 0 for real numbers α and β. Therefore,
the function d(a, b) = |a − b| is a distance function. ♦

Example 1.2.14 For two elements a and b in the set R of real numbers, assume
the function d(a, b) = (a − b)2 . Then, we have (a − b)2 = (b − a)2 , (a − b)2 > 0
when a = b, and (a − b)2 = 0 when a = b. Yet, because (a − c)2 + (c − b)2 =
24 1 Preliminaries

Fig. 1.11 A function with f (x)


support [−1, 1] 1

−1 0 1 x

(a − b)2 + 2(c − a)(c − b) < (a − b)2 when a < c < b, the function d(a, b) =
(a − b)2 is not a distance function. ♦

Definition 1.2.11 (metric space; neighborhood; radius) A set is called a metric


space if a distance is defined for every two points in the set. For a metric space X
with distance function d, the set of all points q such that q ∈ X and d( p, q) < r is
called a neighborhood of p and denoted by Nr ( p), where r is called the radius of
Nr ( p).

Example 1.2.15 For the metric space X = {x : −2 ≤ x ≤ 5} and distance function


d(a, b) = |a − b|, the neighborhood of 0 with radius 1 is N1 (0) = (−1, 1), and
N3 (0) = [−2, 3) is the neighborhood of 0 with radius 3. ♦

Definition 1.2.12 (limit point; closure) A point p is called a limit point of a subset
E of a metric space if E contains at least one point different from p for every
neighborhood of p. The union Ē = E ∪ E L of E and the set E L of all the limit
points of E is called the closure or enclosure of E.

Example 1.2.16 For the metric space X = {x : −2 ≤ x ≤ 5} and distance function


d(a, b) = |a − b|, consider a subset Y = (−1, 2] of X . The set of all the limit points
of Y is [−1, 2] and the closure of Y is (−1, 2] ∪ [−1, 2] = [−1, 2]. ♦

Definition 1.2.13 (support) The closure of the set {x : f (x) = 0} is called the sup-
port of the function f (x).

Example 1.2.17 The support of the function



1 − |x|, |x| ≤ 1,
f (x) = (1.2.11)
0, |x| > 1

is [−1, 1] as shown in Fig. 1.11. ♦

Example 1.2.18 The value of the function f (x) = sin x is 0 when x = nπ for n
integer. Yet, the support of f (x) is the set R of real numbers. ♦
1.3 Continuity of Functions 25

1.3 Continuity of Functions

When {xn }∞ n=1 is a decreasing sequence and x n > x, we denote it by x n ↓ x. When


{xn } is an increasing sequence and xn < x, it is denoted by xn ↑ x. Let us also use
x − = lim(x − ε) = lim(x + ε) and x + = lim(x + ε) = lim(x − ε). In other words,
ε↓0 ε↑0 ε↓0 ε↑0
x − denotes a number smaller than, and arbitrarily close to, x; and x + is a number
greater than, and arbitrarily  For a function f and a point x0 , when f (x0 )
  close to, x.
exists, lim f (x) = f x0− = f x0+ exists, and f (x0 ) = lim f (x), the function
x→x0 x→x0
f is called continuous at point x0 . When a function is continuous at every point in
an interval, the function is called continuous on the interval. Let us now discuss the
continuity of functions (Johnsonbaugh and Pfaffenberger 1981; Khaleelulla 1982;
Munroe 1971; Olmsted 1961; Rudin 1976; Steen and Seebach 1970) in more detail.

1.3.1 Continuous Functions

Definition 1.3.1 (continuous function) If, for every positive number  and every
point x0 in a region S, there exists a positive number δ (x0 , ) such that
| f (x) − f (x0 )| <  for all points x in S when |x − x0 | < δ (x0 , ), then the function
f is called continuous on S.

In other words, for some point x0 in S and some positive number , if there exists
at least one point x in S such that |x − x0 | < δ (x0 , ) yet | f (x) − f (x0 )| ≥  for
every positive number δ (x0 , ), the function f is not continuous on S.

Example 1.3.1 Consider the function



0, x ≤ 0,
u(x) = (1.3.1)
1, x > 0.

Let x0 = 0 and  = 1. For x = 2δ with a positive number δ, we have |x − x0 | = 2δ < δ,


yet | f (x) − f (x0 )| = 1 ≥ . Thus, u is not continuous on R. ♦
 
Theorem 1.3.1 If f is continuous at x p ∈ E, g is continuous at f x p , and h(x) =
g( f (x)) when x ∈ E, then h is continuous at x p .
 
Proof Because
 g is continuous
  at f x p , for every    > 0, there exist a number η
such that g(y) − g f x p  <  when  y − f x p  < η for y∈ f (E). In addition, 
because f is continuous
 at x p , there exists a number δ such that  f (x) − f x p  < η
when x − x p  < δ for x ∈ E. In other
 words,
 for every  number , there exists
 positive 
a number δ such that h(x) − h x p  = g(y) − g f x p  <  when x − x p  < δ
for x ∈ E. Therefore, h is continuous at x p . ♠
26 1 Preliminaries

Definition 1.3.2 (uniform continuity) If, for every positive number , there exists a
positive number δ () such that | f (x) − f (x0 )| <  for all points x and x0 in a region
S when |x − x0 | < δ (), then the function f is called uniformly continuous on S.

In other words, if there exist at least one each of x and x0 in S for a positive number
 such that |x − x0 | < δ () yet | f (x) − f (x0 )| ≥  for every positive number δ (),
then the function f is not uniformly continuous on S.
The difference between uniform continuity and continuity lies in the order of
choosing the numbers x0 , δ, and . Specifically, for continuity, x0 and  are chosen
first and then δ (x0 , ) is chosen, and thus δ (x0 , ) is dependent on x0 and . On the
other hand, for uniform continuity,  is chosen first, δ () is chosen next, and then x0
is chosen last, in which δ () is dependent only on  and not on x or x0 . In short, the
dependence of δ on x0 is the key difference.
When a function f is uniformly continuous, we can make f (x1 ) arbitrarily close
to f (x2 ) for every two points x1 and x2 by moving these two points together. A
uniformly continuous function is always a continuous function, but a continuous
function is not always uniformly continuous. In other words, uniform continuity is
a stronger or more strict concept than continuity.

Example 1.3.2 For the function f (x) = x in S = R, let δ =  with  > 0. Then,
when |x − y| < δ, because | f (x) − f (y)| = |x − y| < δ = , f (x) √ = x is uni-
formly continuous. As shown in Exercise 1.23, the function f (x) = x is uniformly
continuous on the interval (0, ∞). The function f (x) = x1 is uniformly continuous
for all intervals (a, ∞) with a > 0. On the other hand, as shown in Exercise 1.22,
it is not uniformly continuous on the interval (0, ∞). The  f (x) = tan x is
 function
continuous but not uniformly continuous on the interval − π2 , π2 . ♦

Example 1.3.3 In the interval S = (0, ∞), consider f(x) = x 2 . For a positive num-
ber  and a point x0 in S, let a = x0 + 1 and δ = min 1, 2a . Then, for a point x in
S, when |x − x0 | < δ, we have |x − x0 | < 1 and x < x0 + 1 = a because δ ≤ 1. We
also have x0 < a. Now, we have x 2 − x02  = (x + x0 ) |x − x0 | < 2aδ ≤ 2a 2a = 
because δ ≤ 2a , and thus f is continuous. On the other hand, let  = 1, assume a pos-
itive number δ, and
 choose x0 = 1δ and x = x0 + 2δ . Then, we have |x − x0 | = 2δ < δ
 2    
but x − x02  =  1δ + 2δ − δ12  = 1 + δ4 > 1 = , implying that f is not uniformly
2 2

continuous. Note that, as shown in Exercise 1.21, function f (x) = x 2 is uniformly


continuous on a finite interval. ♦

Theorem 1.3.2 A function f is uniformly continuous on S if

| f (x) − f (y)| ≤ M|x − y| (1.3.2)

for a number M and every x and y in S.

In Theorem 1.3.2, the inequality (1.3.2) and the number M are called the Lipschitz
inequality and Lipschitz constant, respectively.
1.3 Continuity of Functions 27

Example 1.3.4 Consider S = R and f (x) = 3x + 7. For a positive number , let


δ = 3 . Then, we have | f (x) − f (x0 )| = 3 |x − x0 | < 3δ =  when |x − x0 | < δ for
every two points x and x0 in S. Thus, f is uniformly continuous on S. ♦

As a special case of the Heine-Cantor theorem, we have the following theorem:

Theorem 1.3.3 If a function is differentiable and has a bounded derivative, then the
function is uniformly continuous.

1.3.2 Discontinuities
   
Definition 1.3.3 (type 1 discontinuity)
 +When
 the three
 values f x + , f x − , and
f (x) all exist and at least one of f x and f x − is different from f (x), the
point x is called
 a type1 discontinuity
 point or a jump discontinuity point, and the
difference f x + − f x − is called the jump or saltus of f at x.

Example 1.3.5 The function



⎨ x + 1, x > 0,
f (x) = 0, x = 0, (1.3.3)

x − 1, x < 0

shown in Fig. 1.12 is type 1 discontinuous at x = 0 and the jump is 2. ♦

   
Definition 1.3.4 (type 2 discontinuity) If at least one of f x + and f x − does not
exist, then the point x is called a type 2 discontinuity point.

Example 1.3.6 The function



cos x1 , x = 0,
f (x) = (1.3.4)
0, x =0

shown in Fig. 1.13 is type 2 discontinuous at x = 0. ♦

Fig. 1.12 An example of a f (x)


type 1 discontinuous
function at x = 0 1

0 x
−1
28 1 Preliminaries

Fig. 1.13 An example of a f (x)


type 2 discontinuous 1
function at x = 0:
f (x) = cos x1 for x = 0 and
f (0) = 0
0 x

−1

Example 1.3.7 The function



1, x = rational number,
f (x) = (1.3.5)
0, x = irrational number

is type 2 discontinuous at any point x, and the function



x, x = rational number,
f (x) = (1.3.6)
0, x = irrational number

is type 2 discontinuous almost everywhere: that is, at all points except x = 0. ♦

Example 1.3.8 The function


1
, x = qp , p ∈ Jand q ∈ J+ are coprime,
f (x) = q (1.3.7)
0, x = irrational number

is5 continuous almost everywhere: that is, f is continuous at all points except at
rational numbers. The discontinuities are all type 2 discontinuities. ♦

Example 1.3.9 Show that the function



sin x1 , x = 0,
f (x) = (1.3.8)
0, x =0

is type 2 discontinuous at x = 0 and continuous at x = 0.


   
Solution Because f 0+ and f 0− do not exist, f (x) is type  2 discontinuous
 at
x = 0. Next, noting that |sin x − sin y| = 2 sin x−y2
cos x+y 
2
≤ 2 sin x−y  ≤ |x −
2
y|, we have |sin x − sin y| <  when |x − y| < δ = . Therefore, sin x is uniformly
continuous. In addition, x1 is continuous at x = 0. Thus, from Theorem 1.3.1, f (x)
is continuous at x = 0. ♦

5 This function is called Thomae’s function.


1.3 Continuity of Functions 29

1.3.3 Absolutely Continuous Functions and Singular


Functions

Definition 1.3.5 (absolute continuity) Consider a finite collection {(ak , bk )}nk=1 of


non-overlapping intervals with ak , bk ∈ (−c, c) for a positive number c. If there
exists a number δ = δ(c, ε) such that


n
| f (bk ) − f (ak )| < ε (1.3.9)
k=1


n
when |bk − ak | < δ for every positive numbers c and ε, then the function f is
k=1
called an absolutely continuous function.

Example 1.3.10 The functions f 1 (x) = x 2 and f 2 (x) = sin x are both absolutely
continuous, and f 3 (x) = x1 is absolutely continuous for x > 0. ♦

If a function f (x) is absolutely continuous on a finite interval (a, b), then there
exists an integrable function f  (x) satisfying
 b
f (b) − f (a) = f  (x)d x, (1.3.10)
a

where −∞ < a < b < ∞ and f  (x) is the derivative of f (x) at almost every point.
The converse also holds true. Note that, if f (x) is not absolutely continuous, the
derivative does not satisfy (1.3.10) even when the derivative of f (x) exists at almost
every point.

Theorem 1.3.4 If a function has a bounded derivative almost everywhere and is


integrable on a finite interval, and the right and left derivatives of the function
exist at the points where the derivatives do not exist, then the function is absolutely
continuous.

Definition 1.3.6 (singular function) A continuous, but not absolutely continuous,


function is called a singular function.

In other words, a continuous function is either an absolutely continuous function


or a singular function.

Example 1.3.11 Denote by Di j the j-th interval that is removed at the i-th step in
the procedure of obtaining the Cantor set C in Example 1.1.46, where i = 1, 2, . . .
and j = 1, 2, . . . , 2i−1 . Draw 2n − 1 line segments
 
2j − 1
y= : x ∈ Di j ; j = 1, 2, . . . , 2i−1 ; i = 1, 2, . . . , n (1.3.11)
2i
30 1 Preliminaries

φ1 (x) φ2 (x)
1 1
3

4

1 1
− −
2 2

1

4

1 2 x 1 2 3 6 7 8 x

3

3 1 − −−
9 9 9
− −−
9 9 9 1

Fig. 1.14 The first two functions φ1 (x) and φ2 (x) of {φn (x)}∞
n=1 converging to the Cantor function
φC (x)

parallel to the x-axis on an (x, y) coordinate plane. Next, draw a straight line each
from the point (0, 0) to the left endpoint of the nearest line segment and from the point
(1, 1) to the right endpoint of the nearest line segment. For every line segment, draw a
straight line from the right endpoint to the left endpoint of the nearest line segment on
the right-hand side. Let the function resulting from this procedure be φn (x). Then,
φn (x) is continuous on the interval
 (0, 1), and is composed of 2n line segments
−n 3 n
of height 2 and slope 2 connected with 2n − 1 horizontal line segments.
Figure 1.14 shows φ1 (x) and φ2 (x). The limit

φC (x) = lim φn (x) (1.3.12)


n→∞

of the sequence {φn (x)}∞


n=1 is called a Cantor function or Lebesgue function. ♦
The Cantor function φC (x) described in Example 1.3.11 can be expressed as

0. c21 c22 · · · , x ∈ C,
φC (x) = (1.3.13)
y shown in (1.3.11), x ∈ [0, 1] − C

when a point x in the Cantor set C discussed in Example 1.1.46 is written as

x = 0.c1 c2 · · · (1.3.14)

in a ternary number. Now, the image of φC (x) is a subset of [0, 1]. In addition,
because the number

x = 0. (2b1 ) (2b2 ) · · · (1.3.15)

is clearly a point in C such that φC (x) = y when y ∈ [0, 1] is expressed in a binary


number as y = 0.b1 b2 · · · , we have [0, 1] ⊆ φC (C). Therefore the range of φC (x) is
[0, 1].
Some properties of the Cantor function φC (x) are as follows:
1.3 Continuity of Functions 31

(1) The Cantor function φC (x) is a non-decreasing function with range [0, 1] and
no jump discontinuity. Because there can be no discontinuity except for jump
discontinuities in non-increasing and non-decreasing functions, φC (x) is a con-
tinuous function.
(2) Let E be the set of points represented by (1.1.41) and (1.1.42). Then, the function
φC (x) is an increasing function at x ∈ C − E and is constant in some neighbor-
hood of every point x ∈ [0, 1] − C.
(3) As observed in Example 1.1.46, the length of [0, 1] − C is 1, and φC (x) is
constant at x ∈ [0, 1] − C. Therefore, the derivative of φC (x) is 0 almost every-
where.

Example 1.3.12 (Salem 1943) The Cantor function φC (x) considered in Example
1.3.11 is a non-decreasing singular function. Obtain an increasing singular function.

Solution
 Consider the line segment P Q connecting P(x, y) and Q (x + Δx , y+
Δ y on a two-dimensional plane, where Δx > 0 and Δ y > 0. Let the point R have
 
the coordinate x + Δ2x , y + λ0 Δ y with 0 < λ0 < 1. Denote the replacement of the
line segment P Q into two line segments P R and R Q by ‘transformation of P Q via
T (λ0 )’. Now, starting from the line segment O A between the origin O(0, 0) and the
point A(1, 1), consider a sequence { f n (x)}∞
n=0 defined by

f 0 (x) = line segment O A,


f 1 (x) = transformation of f 0 (x) via T (λ0 ) ,
f 2 (x) = transformation of each of the two line segments composing f 1 (x)
via T (λ0 ) ,
f 3 (x) = transformation of each of the four line segments composing f 2 (x)
via T (λ0 ) ,
..
.

In other words, f m (x) is increasing from f m (0) = 0 to f m (1) = 1 and is composed


2m −1
of 2m line segments with the x coordinates of the end points 2km k=1 . Figure 1.15
shows { f m (x)}3m=0 with λ0 = 0.7.
Assume we represent the x coordinate of the end points of the line segments
m
θj
composing f m (x) as x = 2j
= 0.θ1 θ2 · · · θm in a binary number, where θ j ∈
j=1
{0, 1}. Then, the y coordinate can be written as


m 
k−1
y = θk λθ j , (1.3.16)
k=1 j=1
32 1 Preliminaries

f0 (x) f1 (x)
1 1

1 x 1 x
1
2

f2 (x) f3 (x)
1 1

1
4
2
4
3
4 1 x 1 2 3 4 5 6 7
8 8 8 8 8 8 8 1 x

Fig. 1.15 The first four functions in the sequence { f m (x)}∞


m=0 converging to an increasing singular
function (λ0 = 0.7 = 1 − λ1 )


k−1
where λ1 = 1 − λ0 and we assume λθ j = 1 when k = 1. The limit f (x) =
j=1
lim f m (x) of the sequence { f m (x)}∞
m=1 is an increasing singular function. ♦
m→∞

Let us note that the convolution6 of two absolutely continuous functions always
results in an absolutely continuous function while the convolution of two singu-
lar functions may sometimes result not in a singular function but in an absolutely
continuous function (Romano and Siegel 1986).

1.4 Step, Impulse, and Gamma Functions

In this section, we describe the properties of unit step, impulse (Challifour 1972;
Gardner 1990; Gelfand and Moiseevich 1964; Hoskins and Pinto 2005; Kanwal 2004;
Lighthill 1980), and gamma functions (Artin 1964; Carlson 1977; Zayed 1996) in
detail.
∞ ∞
6The integral −∞ g(x − v) f (v)dv = −∞ g(v) f (x − v)dv is called the convolution of f and g,
and is usually denoted by f ∗ g or g ∗ f .
1.4 Step, Impulse, and Gamma Functions 33

1.4.1 Step Function

Definition 1.4.1 (unit step function) The function



0, x < 0,
u(x) = (1.4.1)
1, x > 0

is called the unit step function, step function, or Heaviside function and is also
denoted by H (x).

In (1.4.1), the value u(0) is not defined: usually, u(0) is chosen as 0, 21 , 1, or any
value u 0 between 0 and 1. Figure 1.16 shows the unit step function with u(0) = 21 .
In some cases, the unit step function with value α at x = 0 is denoted by u α (x), with
u − (x), u(x), and u + (x) denoting the cases of α = 0, 21 , and 1, respectively. The unit
step function can be regarded as the integral of the impulse or delta function that will
be considered in Sect. 1.4.2.
The unit step function u(x) with u(0) = 21 can be represented as the limit
 
1 1
u(x) = lim + tan−1 (αx) (1.4.2)
α→∞ 2 π
or
1
u(x) = lim (1.4.3)
α→∞ 1 + e−αx

of a sequence of continuous functions. We also have


 ∞
1 1 sin(ωx)
u(x) = + dω. (1.4.4)
2 2π −∞ ω

As we have observed in (1.4.2) and (1.4.3), the unit step function can be defined
alternatively by first introducing step-convergent sequence, also called the Heaviside

Fig. 1.16 Unit step function u(x)


with u(0) = 21
1

0 x
34 1 Preliminaries

convergent sequence or Heaviside sequence. Specifically, employing the notation7


 ∞
a(x), b(x) = a(x)b(x)d x, (1.4.5)
−∞

a sequence {h m (x)}∞
m=1 of real functions that satisfy
 ∞
lim  f (x), h m (x) = f (x)d x
m→∞ 0
=  f (x), u(x) (1.4.6)

for every sufficiently smooth function f (x) in the interval −∞ < x < ∞ is called
a step-convergent sequence or a step sequence, and its limit

lim h m (x) (1.4.7)


m→∞

is called the unit step function.


Example 1.4.1 If we let u(0) = 21 , then u(x) = 1 − u(−x) and u(a − x) = 1 −
u(x − a). ♦
Example 1.4.2 We can obtain8

u(x − |a|) = u(x + a)u(x − a)


 
= u x 2 − a 2 u(x), (1.4.8)

0, min(a, b) < x < max(a, b),
u((x − a)(x − b)) = (1.4.9)
1, x < min(a, b) or x > max(a, b),

and
 
u x 2 = u (|x|)

u(0), x = 0,
= (1.4.10)
1, x = 0

from the definition of the unit step function. ♦



t, t ≤ s,
Example 1.4.3 Let u(0) = 21 . Then, the min function min(t, s) = can
s, t ≥ s
be expressed as

min(t, s) = t u(s − t) + s u(t − s). (1.4.11)

7 When we also take complex functions into account, the notation a(x), b(x) is defined as

a(x), b(x) = −∞ a(x)b∗ (x)d x.
8 In (1.4.8), it is implicitly assumed u(0) = 0 or 1.
1.4 Step, Impulse, and Gamma Functions 35

In addition, we have
 t
min(t, s) = u(s − y)dy (1.4.12)
0

t, t ≥ s,
for t ≥ 0 and s ≥ 0. Similarly, the max function max(t, s) = can be
s, t ≤ s
expressed as

max(t, s) = t u(t − s) + s u(s − t). (1.4.13)

Recollecting (1.4.12), we also have max(t, s) = t + s − min(t, s), i.e.,


 t
max(t, s) = t + s − u(s − y)dy (1.4.14)
0

for t ≥ 0 and s ≥ 0. ♦

The unit step function is also useful in expressing piecewise continuous functions
as single-line formulas.

Example 1.4.4 The function


⎧ 2
⎨ x , 0 < x < 1,
F(x) = 3, 1 < x < 2, (1.4.15)

0, otherwise
 
can be written as F(x) = x 2 u(x) − x 2 − 3 u(x − 1) − 3u(x − 2). ♦

Example 1.4.5 Consider the function



⎨ 1, x > 2m
1
,
h m (x) = mx + 2 , − 2m ≤ x ≤
1 1 1
, (1.4.16)
⎩ 2m
0, x < − 2m
1

∞  1
shown in Fig. 1.17. Then, we have lim −∞ f (x)h m (x)d x = lim 2m
− 2m
1

     m→∞ m→∞
 1
∞ ∞
mx + 21 f (x)d x + 1 f (x)d x = 0 f (x)d x =  f (x), u(x) because −2m1
2m
 2m1
2m

f (x)d x ≤ m max | f (x)| → 0 and m − 1 x f (x)d x ≤ max |x f (x)| → 0 when


1
|x|≤ 2m
1 2m |x|≤ 2m
1

m → ∞ and f (0) < ∞. In other words, the sequence {h m (x)}∞


m=1 is a step-
convergent sequence and its limit is lim h m (x) = u(x). ♦
m→∞

The unit step function we have described so far is defined in the continuous space.
In the discrete space, the unit step function can similarly be defined.
36 1 Preliminaries

Fig. 1.17 A function in the hm (x)


step-convergent sequence
{h m (x)}∞
m=1 1
1
2

1
− 2m 0 1 x
2m

Definition 1.4.2 (unit step function in discrete space) The function



1, x = 0, 1, . . . ,
ũ(x) = (1.4.17)
0, x = −1, −2, . . .

is called the unit step function in discrete space.

Note that, unlike the unit step function u(x) in continuous space for which the
value u(0) is not defined uniquely, the value ũ(0) is defined uniquely as 1. In addition,
for any non-zero real number a, u(|a|x) is equal to u(x) except possibly at x = 0
while ũ(|a|x) and ũ(x) are different9 at infinitely many points when |a| < 1.

1.4.2 Impulse Function

Although an impulse function is also called a generalized function or a distribution,


we will use the terms impulse function and generalized function in this book and
reserve the term distribution for another concept later in Chap. 2.
An impulse function can be introduced in three ways. The first one is to define
an impulse function as the symbolic derivative of the unit step function. The second
way is to define an impulse function via basic properties and the third is to take the
limit of an impulse-convergent sequence.

1.4.2.1 Definitions

As shown in Fig. 1.18, the ramp function



⎨ 0, t ≤ 0,
ra (t) = at , 0 ≤ t ≤ a, (1.4.18)

1, t ≥ a

9 In this case, we assume that ũ(x) is defined to be 0 when x is not an integer.


1.4 Step, Impulse, and Gamma Functions 37

r2 (t) r1 (t) r0.5 (t)


1 1 1

0 2 t 0 1 t 0 0.5 t

Fig. 1.18 Ramp functions r2 (t), r1 (t), and r0.5 (t)

is a continuous function, but not differentiable. Yet, it is differentiable everywhere


except at t = 0 and a. Similarly, the rectangular function

0, t < 0 or t > a,
pa (t) = (1.4.19)
1
a
,0<t <a

shown in Fig. 1.19, is not a continuous function, and therefore not differentiable.
Yet, it is continuous and differentiable everywhere except at t = 0 and a. In addition,
the rectangular function is the derivative of the ramp function almost everywhere:
specifically, pa (t) = dtd ra (t) for t = 0, a. As we can observe in Fig. 1.20, the limit
of the ramp function ra (t) for a → 0 is the unit step function.
Consider the derivative of the unit step function u(t), the limit of the ramp
function ra (t). The order of operations is not always interchangeable in gen-
eral: yet, we interchange the order of the derivative and limit of ra (t). Specif-
ically, as shown in Figs. 1.21 and 1.22, the limit of the derivative of ra (t) can
be regarded as the derivative of the limit of ra (t). In other words, we can imag-

p2 (t) p1 (t) p0.5 (t)


0.5 1 2

0 2 t 0 1 t 0 0.5 t

Fig. 1.19 Rectangular functions p2 (t), p1 (t), and p0.5 (t)

r1 (t) r0.5 (t) u(t)


1 1 1

0 1 t 0 0.5 t 0 t

Fig. 1.20 Limit of the sequence {ra (t)} of ramp functions: from a ramp function to the unit step
function
38 1 Preliminaries

ra (t) u(t) δ(t)


1 lim 1 1
a→0 differentiation

0 a t 0 t 0 t
limit differentiation
Fig. 1.21 Ramp function −→ step function −→ impulse function

 
ine d
dt
u(t) = d
dt
lim ra (t) = lim d
r (t)
dt a
= lim pa (t). Based on this simple yet
a→0 a→0 a→0
useful description, let us introduce the impulse function in more detail.
Definition 1.4.3 (impulse function) The derivative

du(x)
δ(x) = (1.4.20)
dx

of the unit step function u(x) is called an impulse function or a generalized function.
As we have already observed, the unit step function u(x) is not continuous at
x = 0 and therefore not differentiable. This implies that (1.4.20) is technically not
defined at x = 0: (1.4.20) can then be interpreted as “Let us define the ‘symbolic’
differentiation of u(x) as δ(x).” Clearly, the impulse function δ(x) is not defined at
x = 0: it is often assumed that10 δ(0) → ∞.
Let us next consider the second definition of the impulse function.
Definition 1.4.4 (impulse function) A function δ(x) satisfying the conditions

δ(x − c) = 0 for x = c (1.4.21)

and
 β 
1, if α < c < β,
δ(x − c)d x = (1.4.22)
α 0, if α = β = c, c < α < β, or α < β < c

is called an impulse function.



When c = α or c = β, the value of the integral α δ(x − c)d x is assumed 21 in
some applications. For a sufficiently smooth function f , the second condition (1.4.22)
can be expressed as
 ∞
δ(x − c) f (x)d x = f (c), (1.4.23)
−∞

which is called the sifting or reproducing property of the impulse function.

10 As we shall see shortly in (1.4.27), we have δ(0) → −∞ in some cases.


1.4 Step, Impulse, and Gamma Functions 39

ra (t) p(t) δ(t)


1 1 lim 1
differentiation a a→0

0 a t 0 a t 0 t

differentiation limit
Fig. 1.22 Ramp function −→ rectangular function −→ impulse function

It is rather mathematically difficult to find a function satisfying the two conditions


(1.4.21) and (1.4.22). For example, the integral over a non-zero interval of a function
which is 0 except at one point will be zero, which contradicts the two conditions in
Definition 1.4.4 if we confine ourselves to technicalities. On the other hand, some
sequences of functions satisfy (1.4.23) as we can easily see11 in
 ∞
sin mx
lim f (x) d x = f (0), (1.4.24)
m→∞ −∞ πx

for instance. Based on this observation, we can define the impulse function via a
proper sequence of functions. Let us first introduce the concept of impulse-convergent
sequence similarly as we introduced the step-convergent sequence.
Definition 1.4.5 (impulse-convergent sequence) When
 ∞
lim f (x)sm (x)d x = f (0) (1.4.25)
m→∞ −∞

is satisfied for every sufficiently smooth function f (x) over the interval −∞ < x <
∞, the sequence {sm (x)}∞ m=1 is called an impulse sequence, an impulse-convergent
sequence, or a delta-convergent sequence.
sin mx 1 m ∞ m −m|x| ∞ ∞
Example 1.4.6 The sequences πx
, π 1+m 2 x 2 m=1
, e ,
 ∞   
m=1 2 m=1
2 ∞
m
π(emx +e−mx ) m=1
, and m
π
exp −mx m=1 are all impulse-convergent sequences.

Example 1.4.7 The sequences {sm (x)}∞
m=1 of functions with
 
1
sm (x) = m u − |x| , (1.4.26)
2m

⎨ −m, |x| < 2m 1
,
sm (x) = 2m, 2m ≤ |x| ≤
1 1
, (1.4.27)
⎩ m
0, |x| > m1 ,

11 Exercise 1.24 will show this result.


40 1 Preliminaries

and
(2m + 1)!  m
sm (x) = 1 − x 2 u(1 − |x|) (1.4.28)
2 2m+1 (m!) 2

are all delta-convergent sequences. ♦

Example 1.4.8 The sequences {sm (x)}∞


m=1 of functions defined by
⎧ 2
⎨ 4m x + 2m, − 2m
1
≤ x ≤ 0,
sm (x) = −4m x + 2m, 0 < x ≤ 2m
2 1
, (1.4.29)

0, |x| > 2m
1

and
⎧ 2
⎨ m x + m, − m1 ≤ x ≤ 0,
sm (x) = −m 2 x + m, 0 ≤ x ≤ m1 , (1.4.30)

0, |x| > m1

are both delta-convergent sequences. ♦


∞
Example 1.4.9 For a non-negative function f (x) with −∞ f (x)d x = 1, the
sequence {m f (mx)}∞
m=1 is an impulse sequence. ♦

The impulse function can now be defined based on the impulse-convergent


sequence as follows:

Definition 1.4.6 (impulse function) The limit

lim sm (x) = δ(x) (1.4.31)


m→∞

of a delta-convergent sequence {sm (x)}∞


m=1 is called an impulse function.

Example 1.4.10 Sometimes, we have δ(0) → −∞ as in (1.4.27). ♦

1.4.2.2 Properties

We have

δ(x), φ(x) = φ(0), (1.4.32)


   
δ (k) (x), φ(x) = (−1)k δ(x), φ(k) (x) , (1.4.33)
 f (x)δ(x), φ(x) = δ(x), f (x)φ(x), (1.4.34)

and
1.4 Step, Impulse, and Gamma Functions 41
 
1 x 
δ(ax), φ(x) = δ(x), φ (1.4.35)
|a| a

when φ and f are sufficiently smooth functions.


Letting a = −1 in (1.4.35), we get δ(−x), φ(x) = δ(x), φ(−x) = φ(0).
Therefore

δ(−x) = δ(x) (1.4.36)

from (1.4.32). In other words, δ(x) is an even function.

Example 1.4.11 For the minimum function min(t, s) = t u(s − t) + s u(t − s)


introduced in (1.4.11), we get ∂t∂ min(t, s) = u(s − t) and

∂2
min(t, s) = δ(s − t)
∂s∂t
= δ(t − s) (1.4.37)

by noting that tδ(s − t) − sδ(t − s) = tδ(s − t) − tδ(s − t) = 0. Similarly, for


max(t, s) = t u(t − s) + s u(s − t) introduced in (1.4.13), we have ∂t∂ max(t, s) =
u(t − s) and

∂2
max(t, s) = −δ(s − t)
∂s∂t
= −δ(t − s) (1.4.38)

by noting again that tδ(s − t) − sδ(t − s) = tδ(s − t) − tδ(s − t) = 0. ♦

Let us next introduce the concept of a test function and then consider the product
of a function and the n-th order derivative δ (n) (x) of the impulse function.

Definition 1.4.7 (test function) A real function φ satisfying the two conditions below
is called a test function.
(1) The function φ(x) is differentiable infinitely many times at every point x =
(x1 , x2 , . . . , xn ).
(2) There exists a finite number A such that φ(x) = 0 for every point x =
(x1 , x2 , . . . , xn ) satisfying x12 + x22 + · · · + xn2 > A.

A function satisfying condition (1) above is often called a C ∞ function. Definition


1.4.7 allows a test function in the n-dimensional space: however, we will consider
mainly one-dimensional test functions in this book.
 2

Example 1.4.12 The function φ(x) = exp − a 2a−x 2 u(a − |x|) shown in Fig. 1.23
is a test function. ♦
42 1 Preliminaries

Fig. 1.23 A test function


φ(x) = e−1
2

exp − a 2a−x 2 u(a − |x|)

−a 0 a x

Theorem 1.4.1 If a function f is differentiable n times consecutively, then


n
f (x)δ (n) (x − b) = (−1)n (−1)k n Ck f (n−k) (b)δ (k) (x − b), (1.4.39)
k=0

where n Ck denotes the binomial coefficient.


(n)
Proof b = 0 and assume
 ∞ Let (n)  ∞a test function(n)φ(x). We get  f (x)δ (x), φ(x) =
−∞ f (x)δ (x)φ(x)d x = −∞ { f (x)φ(x)}δ (x)d x, i.e.,

 ∞
 f (x)δ (n) (x), φ(x) = { f (x)φ(x)} δ (n−1) (x) −∞
 ∞
− { f (x)φ(x)} δ (n−1) (x)d x
−∞
..
.
 ∞
= (−1) n
{ f (x)φ(x)}(n) δ(x)d x (1.4.40)
−∞

because φ(x) = 0 for x → ±∞. Now, (1.4.40) can be written as

 f (x)δ (n) (x), φ(x) = (−1)n f (n) (0)δ(x), φ(x)


+ (−1)−1 n f (n−1) (0)δ (1) (x), φ(x)
n(n − 1) (n−2)
+ (−1)−2 f (0)δ (2) (x), φ(x)
2!
..
.
+ (−1)−n f (0)δ (n) (x), φ(x) (1.4.41)


n
using (1.4.33) because { f (x)φ(x)}(n) = n Ck f (n−k) (x)φ(k) (x). The result (1.4.41)
k=0
is the same as the symbolic expression (1.4.39). ♠

The result (1.4.39) implies that the product of a sufficiently smooth function f (x)
n
and δ (n) (x − b) can be expressed as a linear combination of δ (k) (x − b) k=0 with
1.4 Step, Impulse, and Gamma Functions 43

the coefficient of δ (k) (x − b) being the product of the number (−1)n−k n Ck and12 the
value f (n−k) (b) of f (n−k) (x) at x = b.

Example 1.4.13 From (1.4.39), we have

f (x)δ(x − b) = f (b)δ(x − b) (1.4.42)

when n = 0. ♦

Example 1.4.14 Rewriting (1.4.39) specifically for easier reference, we have

f (x)δ  (x − b) = − f  (b)δ(x − b) + f (b)δ  (x − b), (1.4.43)


f (x)δ  (x − b) = f  (b)δ(x − b) − 2 f  (b)δ  (x − b) + f (b)δ  (x − b), (1.4.44)

and

f (x)δ  (x − b) = − f  (b)δ(x − b) + 3 f  (b)δ  (x − b)


− 3 f  (b)δ  (x − b) + f (b)δ  (x − b) (1.4.45)

when n = 1, 2, and 3, respectively. ♦

Example 1.4.15 From (1.4.43), we get δ  (x) sin x = −(cos 0)δ(x) + (sin 0)δ  (x)
= −δ(x). ♦

Theorem 1.4.2 For non-negative integers m and n, we have x m δ (n) (x) = 0 and
x m δ (n) (x) = (−1)m (n−m)!
n!
δ (n−m) (x) when m > n and m ≤ n, respectively.

Theorem 1.4.2 can be obtained directly from (1.4.39).

Theorem 1.4.3 The impulse function δ ( f (x)) of a function f can be expressed as

n
δ (x − xm )
δ ( f (x)) = , (1.4.46)
m=1
| f  (xm )|

where {xm }nm=1 denotes the real simple zeroes of f .

Proof Assume that function f has one real simple zero x1 , and consider a sufficiently
small interval Ix1 = (α, β) with α < x1 < β. Because x1 is the simple zero of f , we
have f  (x1 ) = 0. If f  (x1 ) > 0, then f (x) increases from f (α) to f (β) as x moves
from α to β. Consequently, u ( f (x)) = u (x − x1 ) and ddx u ( f (x)) = δ (x − x1 )
f (x)) d f (x)
on the interval Ix1 . On the other hand, we have ddx u ( f (x)) = du( =
 d f (x) dx
d f (x)  
δ( f (x)) d x  = δ( f (x)) f (x1 ). Thus, we get
{x: f (x)=0}

12 Note that (−1)n+k = (−1)n−k .


44 1 Preliminaries

δ (x − x1 )
δ( f (x)) = . (1.4.47)
f  (x1 )

Similarly, if f  (x1 ) < 0, we have u ( f (x)) = u (x1 − x) and δ( f (x)) = − δ(x−x 1)


f  (x1 )
on
the interval Ix1 . In other words,

δ (x − x1 )
δ( f (x)) = . (1.4.48)
| f  (x1 )|

Extending (1.4.48) to all the real simple zeroes of f , we get (1.4.46). ♠

Example 1.4.16 From (1.4.46), we can get


 
1 b
δ(ax + b) = δ x+ (1.4.49)
|a| a
 
and δ x 3 + 3x = 1
3
δ(x). ♦

Example 1.4.17 Based on (1.4.46), it can be shown that δ((x − a)(x − b)) =
1
{δ(x − a) + δ(x − b)} when b > a, δ(tan x) = δ(x) when − π2 < x < π2 , and
b−a  
δ(cos x) = δ x − π2 when 0 < x < π. ♦


For the function13 δ  ( f (x)) = dδ(v)
dv 
or
v= f (x)

dδ( f (x))
δ  ( f (x)) = , (1.4.50)
d f (x)

we similarly have the following theorem:

Theorem 1.4.4 We have


n !  "
1 δ (x − xm ) f  (xm )
δ  ( f (x)) = + δ (x − x m ) , (1.4.51)
m=1
| f  (xm )| f  (xm ) { f  (xm )}2

where {xm }nm=1 denote the real simple zeroes of f (x).

Proof We recollect that

δ  (x − xm ) f  (xm ) 1

= δ (x − xm ) +  δ  (x − xm ) (1.4.52)
f (x) { f (xm )}
 2 f (xm )

# $
13 Note that g  ( f (x)) = dg(y)
dy = d
dx g( f (x)). In other words, g  ( f (x)) denotes
# $ y= f (x)
dg(y) dg( f )
dy or d f , but not d x g( f (x)).
d
y= f (x)
1.4 Step, Impulse, and Gamma Functions 45

   
 
(x)  
from (1.4.43) because 1
f  (x)  = − { ff  (x)} 2 = − { ff  (x(xm)}) 2 . Recollect also
x=xm m
x=xm
that

d n
δ  (x − xm )
δ ( f (x)) = (1.4.53)
dx m=1
| f  (xm )|

dδ( f ) d x dδ( f )
from (1.4.46). Then, because δ  ( f ) = df
= d f dx
, we have

1  δ  (x − xm )
n
δ  ( f (x)) = (1.4.54)
f  (x) m=1 | f  (xm )|

from (1.4.53). Now, employing (1.4.52) into (1.4.54) results in


n !  "
1 f (xm ) δ  (x − xm )
δ  ( f (x)) = δ (x − x m ) + , (1.4.55)
m=1
| f  (xm )| { f  (xm )}2 f  (xm )

completing the proof. ♠

When the real simple zeroes of a sufficiently smooth function f (x) are {xm }nm=1 ,
Theorem 1.4.3 indicates that δ( f (x)) can be expressed as a linear combination
of {δ (x − xm )}nm=1 and Theorem 1.4.4 similarly indicates that δ  ( f (x)) can be
n
expressed as a linear combination of δ (x − xm ) , δ  (x − xm ) m=1 .

Example 1.4.18 The function f (x) = (x − 1)(x − 2) has two simple zeroes x =
1 and 2. We thus have δ  ((x − 1)(x − 2)) = 2δ(x − 1) + 2δ(x − 2) − δ  (x − 1) +
δ  (x − 2) because f  (1) = −1, f  (1) = 2, f  (2) = 1, and f  (2) = 2. ♦

Example 1.4.19 The function f (x) = sinh 2x has one simple zero x = 0. Then, we
get δ  (sinh 2x) = 21 21 δ  (x) + 0 = 14 δ  (x) from f  (0) = 2 cosh 0 = 2 and f  (0) =
4 sinh 0 = 0. ♦

1.4.3 Gamma Function

In this section, we address definitions and properties of the factorial, binomial coef-
ficient, and gamma function (Andrews 1999; Wallis and George 2010).

1.4.3.1 Factorial and Binomial Coefficient

Definition 1.4.8 (falling factorial; factorial) We call


46 1 Preliminaries

1, k = 0,
[m]k = (1.4.56)
m(m − 1) · · · (m − k + 1), k = 1, 2, . . .

the k falling factorial of m. The number of enumeration of n distinct objects is

n! = [n]n (1.4.57)

for n = 1, 2, . . ., where the symbol ! is called the factorial.


Consequently,

0! = 1. (1.4.58)

Example 1.4.20 If we use each of the five numbers {1, 2, 3, 4, 5} once, we can
generate 5! = 5 × 4 × 3 × 2 × 1 = 120 five-digit numbers. ♦
Definition 1.4.9 (permutation) The number of ordered arrangements with k differ-
ent items from n different items is

n Pk = [n]k (1.4.59)

for k = 0, 1, . . . , n, and n Pk is called the (n, k) permutation.


Example 1.4.21 If we choose two numbers from {2, 3, 4, 5, 6} and use each of the
two numbers only once, then we can make 5 P2 = 20 two-digit numbers. ♦
Theorem 1.4.5 The number of arrangements with k different items from n different
items is n k if repetitions are allowed.
Example 1.4.22 We can make 104 passwords of four digits with {0, 1, . . . , 9}. ♦
Definition 1.4.10 (combination) The number of ways to choose k different items
from n items is

n!
n Ck = (1.4.60)
(n − k)!k!
n 
for k = 0, 1, . . . , n, and n Ck , written also as k
, is called the (n, k) combination.
The symbol n Ck shown in (1.4.60) is also called the binomial coefficient, and
satisfies

n Ck = n Cn−k . (1.4.61)

From (1.4.59) and (1.4.60), we have

n Pk
n Ck = . (1.4.62)
k!
1.4 Step, Impulse, and Gamma Functions 47

Example 1.4.23 We can choose two different numbers from the set {0, 1, . . . , 9} in
10 C2 = 45 different ways. ♦

Theorem 1.4.6 For n repetitions of choosing one element from {ω1 , ω2 , . . . , ωm },


the number of results in which we have n i of ωi for i = 1, 2, . . . , m is

  ⎨ 
m
n n!
n 1 !n 2 !···n m !
, if n i = n,
= (1.4.63)
n1, n2, . . . , nm ⎩ i=1
0, otherwise,
 
where n i ∈ {0, 1, . . . , n}. The left-hand side n
n 1 ,n 2 ,...,n m
of (1.4.63) is called the multi-
nomial coefficient.

Proof We have
    
n n − n1 n − n 1 − n 2 − · · · − n m−1
···
n1 n2 nm
n! (n − n 1 )! (n − n 1 − n 2 − · · · − n m−1 )!
= ···
n 1 ! (n − n 1 )! n 2 ! (n − n 1 − n 2 )! nm !
n!
= (1.4.64)
n 1 !n 2 ! · · · n m !

because the number of desired results is the number that ω1 occurs n 1 times among n
occurrences, ω2 occurs n 2 times among the remaining n − n 1 occurrences, · · · , and
ωm occurs n m times among the remaining n − n 1 − n 2 · · · − n m−1 occurrences. ♠

The multinomial coefficient is clearly a generalization of the binomial coefficient.

Example 1.4.24 Let A = {1, 2, 3}, B = {4, 5}, and C = {6} in rolling a die. When
the rolling is repeated 10 times, the number
 10 of results in which A, B, and C occur
four, five, and one times, respectively, is 4,5,1 = 1260. ♦

1.4.3.2 Gamma Function

For α > 0,
 ∞
Γ (α) = x α−1 e−x d x (1.4.65)
0

is called the gamma function, which satisfies

Γ (α) = (α − 1)Γ (α − 1) (1.4.66)

and
48 1 Preliminaries

Γ (n) = (n − 1)! (1.4.67)

when n is a natural number. In other words, the gamma function can be viewed as a
generalization of the factorial.
Let us now consider a further generalization. When α < 0 and α = −1, −2, . . .,
we can define the gamma function as

Γ (α + k + 1)
Γ (α) = , (1.4.68)
α(α + 1)(α + 2) · · · (α + k)

where k is the smallest integer such that α + k + 1 > 0. Next, for a complex number
z, let14

1, n = 0,
(z)n = (1.4.69)
z(z + 1) · · · (z + n − 1), n = 1, 2, . . . .

Then, for non-negative integers α and n, we can express the factorial as

(α + n)(α + n − 1) · · · (α + 1)α(α − 1) · · · 1
α! =
(α + 1)(α + 2) · · · (α + n)
(α + n)!
= (1.4.70)
(α + 1)n
(n+1)α n!
from (1.4.67)–(1.4.69). Rewriting (1.4.70) as α! = (α+1)n
and subsequently as

n α n! (n + 1)α
α! = , (1.4.71)
(α + 1)n nα

we have
n α n!
α! = lim (1.4.72)
n→∞ (α + 1)n

    
because lim (n+1)
α
α
= 1 + n1 1 + n2 · · · 1 + αn = 1. Based on (1.4.72), the
n→∞ n
gamma function for a complex number α such that α = 0, −1, −2, . . . can be defined
as

n α−1 n!
Γ (α) = lim , (1.4.73)
n→∞ (α)n

which can also be written as

14 Here, (z)n is called the rising factorial, ascending factorial, rising sequential product, upper
factorial, Pochhammer’s symbol, Pochhammer function, or Pochhammer polynomial, and is the
same as Appell’s symbol (z, n).
1.4 Step, Impulse, and Gamma Functions 49

n α n!
Γ (α) = lim (1.4.74)
n→∞ (α)n+1

n α−1 n! (α+n)n α−1 n! n α n!


 
because lim = lim = lim 1 + αn .
n→∞ (α)n n→∞ (α)n+1 n→∞ (α)n+1
1 and (α + 1)n = (α)n α+n
0
Now, recollecting Γ (1) = lim n(1)n!n = lim n! = α
, we
n→∞ n→∞ n!
n α n! αn α−1
have Γ (α + 1) = lim = lim lim n n! from (1.4.73), and therefore
n→∞ (α+1)n n→∞ α+n n→∞ (α)n

Γ (α + 1) = αΓ (α), (1.4.75)

which is the same as (1.4.66). Based on (1.4.75), we can obtain lim pΓ ( p) =


p→0
lim Γ ( p + 1), i.e.,
p→0

lim pΓ ( p) = 1. (1.4.76)
p→0

In parallel, when a ≥ b, we have lim (cn)b−a ΓΓ (cn+a)


(cn+b)
= lim (cn)b−a (cn + a −
n→∞ n→∞
1)(cn + a − 2) · · · (cn + b) = lim (cn)b−a (cn)a−b , i.e.,
n→∞

Γ (cn + a)
lim (cn)b−a = 1. (1.4.77)
n→∞ Γ (cn + b)

We can similarly show that (1.4.77) also holds true for a < b.
The gamma function Γ (α) is analytic at all points except at α = 0, −1, . . .. In
α−1
addition, noting that lim (α + k)Γ (α) = lim lim (α+k)n (α)n
n!
or
α→−k α→−k n→∞

n −k−1 n!
lim (α + k)Γ (α) = lim
α→−k n→∞ (−k)(−k + 1) . . . (−k + n − 1)

(−1)k n!
= lim
k! n→∞ n k+1 (n − k − 1)!
(−1)k (n − k)(n − k + 1) · · · n
= lim
k! n→∞ n k+1
    
(−1) k
k k−1 0
= lim 1 − 1− ··· 1 −
k! n→∞ n n n
(−1)k
= (1.4.78)
k!

for k = 0, 1, . . . because (−k)(−k + 1) . . . (−k + n − 1) = (−k)(−k + 1) . . . (−k


+ k − 1)(−k + k + 1) . . . (−k + n − 1) = (−1)k k!(n − k − 1)!, the residue of
−α
Γ (α) is (−1)
(−α)!
at the simple pole α ∈ {0, −1, −2, . . .}.
50 1 Preliminaries

As we shall show later in (1.A.31)–(1.A.39), we have (Abramowitz and Stegun


1972)
π
Γ (1 − x)Γ (x) = (1.4.79)
sin πx

for 0 < x < 1, which is called the Euler reflection formula. Because we have Γ ( −
π
= (−1) π
n
n)Γ (n + 1 − ) = sin π{(n+1)−} sin π
when x = n + 1 −  and Γ (−)Γ (1 +
π π
) = sin(π+π) = − sin π when x = 1 +  from (1.4.79), we have

Γ (−)Γ (1 + )
Γ ( − n) = (−1)n−1 . (1.4.80)
Γ (n + 1 − )

Replacing x with 1
2
+ x and 3
4
+ x in (1.4.79), we have
   
1 1 π
Γ −x Γ +x = (1.4.81)
2 2 cos πx

and
    √
1 3 2π
Γ −x Γ +x = , (1.4.82)
4 4 cos πx − sin πx

respectively.

Example 1.4.25 We get


 
1 √
Γ = π (1.4.83)
2

with x = 1
2
in (1.4.79). ♦
1 
Example 1.4.26 By recollecting (1.4.75) and (1.4.83), we can obtain Γ +k =
1     √ 2
2
+ k − 1 21 + k − 2 · · · 21 Γ 21 = 1×3×···×(2k−1)
2k
π, i.e.,
 
1 Γ (2k + 1)
Γ +k =
2 22k Γ (k + 1)
(2k)! √
= 2k π (1.4.84)
2 k!
for k ∈ {0, 1, . . .}. Similarly, we get
 
1 22k k! √
Γ −k = (−1)k π (1.4.85)
2 (2k)!
1.4 Step, Impulse, and Gamma Functions 51
        
for k ∈ {0, 1, . . .} using Γ 21 = 21 − 1 21 − 2 · · · 21 − k Γ 21 − k =
  1 
(−1)k 1×3×···×(2k−1)
2k
Γ 21 − k = (−1)k 2(2k)!
2k k! Γ 2
−k . ♦

From (1.4.84) and (1.4.85), we have


   
1 1
Γ −k Γ + k = (−1)k π, (1.4.86)
2 2

which is the same as (1.4.81) with x √ an integer k. We can√ obtain


 3  1  1  √π        
Γ 2 = 2 Γ 2 = 2 , Γ 25 = 23 Γ 23 = 3 4 π , Γ 27 = 25 Γ 25 = 158 π , · · ·
    √     √
from (1.4.84), and Γ − 21 = −2Γ 21 = −2 π, Γ − 23 = − 23 Γ − 21 = 4 3 π ,
    √
Γ − 25 = − 25 Γ − 23 = − 815π , · · · from (1.4.85). Some of such values are shown
in Table 1.2.
We can rewrite (1.4.79) as
πx
Γ (1 − x)Γ (1 + x) = (1.4.87)
sin πx

by recollecting Γ (x + 1) = xΓ (x) shown in (1.4.75).


  In addition,
 using
 1 (1.4.79)–
 
(1.4.82), (1.4.86), and (1.4.87), we have15 Γ 13 Γ 23 = √ 2π
, Γ Γ 43 =
√     3 4
2π, and Γ 16 Γ 56 = 2π. Note also that, by letting v = t β when β >
∞   ∞ α  ∞ α+1
0, we have 0 t α exp −t β dt = 0 v β e−v β1 v β −1 dv = β1 0 v β −1 e−v dv =
1

 
α+1
1
β
Γ β
. Subsequently, we have

 ∞  
  1 α+1
t α exp −t β dt = Γ (1.4.88)
0 |β| β
∞   0 α  
t α exp −t β dt = ∞ v β e−v β1 v β −1 dv = − β1 Γ α+1
1
because 0 β
by letting v =
t β when β < 0.
When α ∈ {−1, −2, . . .}, we have (Artin 1964)

Γ (α + 1) → ±∞. (1.4.89)
 
More specifically, the value Γ α + 1± = α± ! can be expressed as
 
Γ α + 1± → ±(−1)α+1 ∞. (1.4.90)

√ √  
15Here, π ≈ 1.7725, √

≈ 3.6276, and 2π ≈ 4.4429. In addition, we have Γ 18 ≈ 7.5339,
1 1 3 1 1 1  
Γ 7 ≈ 6.5481, Γ 6 ≈ 5.5663, Γ 5 ≈ 4.5908, Γ 4 ≈ 3.6256, Γ 3 ≈ 2.6789, Γ 23 ≈
         
1.3541, Γ 43 ≈ 1.2254, Γ 45 ≈ 0.4022, Γ 56 ≈ 0.2822, Γ 67 ≈ 0.2082, and Γ 78 ≈
0.1596.
52 1 Preliminaries

For instance, we have lim Γ (α + 1) = +∞, lim Γ (α + 1) = −∞, lim Γ (α +


α↓−1 α↑−1 α↓−2
1) = −∞, lim Γ (α + 1) = +∞, · · · .
α↑−2
Finally, when α − β is an integer, α < 0, and β < 0, consider a num-
ber v for which both α − v and β − v are natural numbers. Specifically, let
v = min(α, β) − k for k = 1, 2, . . .: for instance, v ∈ {−π − 1, −π − 2, . . .} when
α = −π and β = 1 − π, and v ∈ {−6.4, −7.4, . . .} when α = −3.4 and β = −5.4.
α−v+1
Rewriting ΓΓ (α+1)
(β+1)
= Γ Γ(α+1) Γ (v)
(v) Γ (β+1)
= α(α−1)···v
β(β−1)···v
= (−1) (−v)(−v−1)···(−α+1)(−α)
(−1)β−v+1 (−v)(−v−1)···(−β+1)(−β)
=
Γ (−β)
(−1)α−β ΓΓ(−v+1)
(−α) Γ (−v+1)
, we get

Γ (α + 1) Γ (−β)
= (−1)α−β . (1.4.91)
Γ (β + 1) Γ (−α)
Γ (y+1) Γ (y+1)
Based on (1.4.91), it is possible to obtain lim = lim =
x↓β, y↓α Γ (x+1) x↑β, y↑α Γ (x+1)
Γ (y+1) Γ (y+1)
(−1)α−β ΓΓ (−α)
(−β)
and lim = lim = (−1) α−β+1 Γ (−β)
: in essence,
x↓β, y↑α Γ (x+1) x↑β, y↓α Γ (x+1) Γ (−α)

we have16

⎨ Γ (α± ) = (−1)α−β Γ (−β+1) ,
Γ (β ± ) Γ (−α+1)
(1.4.92)
⎩ Γ (α± ) = (−1)α−β+1 Γ (−β+1) .

Γ (β ) Γ (−α+1)

This expression is quite useful when we deal with permutations and combinations
of negative integers, as we shall see later in (1.A.3) and Table 1.4.

Example 1.4.27 We have ΓΓ (2−π)(1−π)


= (−1)−1 ΓΓ(π−1)
(π)
= 1−π
1
from (1.4.91). We can
Γ (−5 )
+
Γ (5)
easily get Γ (−4+ ) = − Γ (6) = − 15 from (1.4.91) or (1.4.92). ♦

Example 1.4.28 It is known (Abramowitz


√ and Stegun 1972) that Γ ( j) ≈
−0.1550 − j0.4980, where j = −1. Now, recollect Γ (−z) = − Γ (1−z) z
from
Γ (1 − z) = −zΓ (−z). Thus, using the Euler reflection formula (1.4.79), we have
Γ (z)Γ (−z) = − Γ (z)Γz(1−z) , i.e.,

π
Γ (z)Γ (−z) = − (1.4.93)
z sin πz

for z = 0, ±1,±2, . . .. Next, because e± j x = cos x ± j sin x, we have sin x =


1
2j
e j x − e− j x , Γ (z̄) = Γ (z), and Γ (1 − j) = − jΓ (− j). Thus, we have
Γ ( j)Γ (− j) = |Γ ( j)|2 = − j sinπ π j , i.e.,


|Γ ( j)|2 = , (1.4.94)
eπ − e−π

16 When we use this result, we assume α+ and β + unless specified otherwise.


1.4 Step, Impulse, and Gamma Functions 53

Table 1.2 Values of Γ (z) for z = − 29 , − 27 , . . . , 9


2
z − 29 − 27 − 25 − 23 − 21 1
2
3
2√
5
2√
7
2 √
9
2 √
√ √ √ √ √ √
Γ (z) − 32945π 16 π
105 − 815π 4 π
3 −2 π π 2
π 3 π
4
15 π
8
105 π
16

where 2π
eπ −e−π
≈ 0.2720 and 0.15502 + 0.49802 ≈ 0.2720. ♦

Example 1.4.29 If we consider only the region α > 0, then the gamma function
(Zhang and Jian 1996) exhibits the minimum value Γ (α0 ) ≈ 0.8856 at α = α0 ≈
1.4616, and is convex downward because Γ  (α) > 0. ♦

1.4.3.3 Beta Function

The beta function is defined as17


 1
B̃(α, β) = x α−1 (1 − x)β−1 d x (1.4.95)
0

for complex numbers α and β such that Re(α) > 0 and Re(β) > 0. In this section,
let us show
Γ (α)Γ (β)
B̃(α, β) = . (1.4.96)
Γ (α + β)
1
We have x α−1 (1 − x)(1 − x)β−1 d x = B̃(α, β) − B̃(α +
B̃(α, β + 1) =
1 0
1
1, β) and B̃(α, β + 1) = α1 x α (1 − x)β x=0 + αβ 0 x α (1 − x)β−1 d x = αβ B̃(α +
1, β). Thus,

α+β
B̃(α, β) = B̃(α, β + 1), (1.4.97)
β

  β   1
which can also be obtained as B̃(α, β + 1) = 01 x α+β−1 1−x 1 x α+β 1−x β 
d x = α+β +
x x 
x=0
 
β  1 α+β 1−x β−1 d x β  1 α−1 β
α+β 0 x x = α+β 0 x (1 − x)β−1 d x = α+β B̃(α, β). Using (1.4.97) repeatedly,
x2
we get

17 The right-hand side of (1.4.95) is called the Eulerian integral of the first kind.
54 1 Preliminaries

(α + β)(α + β + 1)
B̃(α, β) = B̃(α, β + 2)
β(β + 1)
(α + β)(α + β + 1)(α + β + 2)
= B̃(α, β + 3)
β(β + 1)(β + 2)
..
.
(α + β)n
= B̃(α, β + n), (1.4.98)
(β)n

which can be expressed as


 α−1  
(α + β)n n! t n
t β+n−1 dt
B̃(α, β) = 1−
n! (β)n0 n n n
β−1  n  β+n−1
(α + β)n n n! t
= α+β−1 t α−1 1 − dt (1.4.99)
n n! (β)n 0 n
 β+n−1  n
after some manipulations. Now, because lim 1 − nt = lim 1 + −t n
n→∞ n→∞
 β−1
1 − nt = e−t , if we let n → ∞ in (1.4.99) and recollect the defining equation
(1.4.73) of the gamma function, we get
 ∞
Γ (β)
B̃(α, β) = t α−1 e−t dt. (1.4.100)
Γ (α + β) 0

1
Next, from (1.4.95) with β = 1, we get B̃(α, 1) = 0 t α−1 dt = α1 . Using this
result into (1.4.100) with β = 1, we get
 ∞
1 Γ (1)
= t α−1 e−t dt. (1.4.101)
α Γ (α + 1) 0

Therefore, recollecting Γ (α + 1) = αΓ (α) and Γ (1) = 1, we have18


 ∞
t α−1 e−t dt = Γ (α). (1.4.102)
0

From (1.4.100) and (1.4.102), we get (1.4.96). Note that (1.4.100)–(1.4.102) implic-
itly dictates that the defining equation (1.4.65) of the gamma function Γ (α) for α > 0
is a special case of (1.4.73).

18The left-hand side of (1.4.102) is called the Eulerian integral of the second kind. In Exercise 1.38,
we consider another way to show (1.4.96).
1.5 Limits of Sequences of Sets 55

1.5 Limits of Sequences of Sets

In this section, the properties of infinite sequences of sets are discussed. The expo-
sition in this section will be the basis, for instance, in discussing the σ-algebra in
Sect. 2.1.2 and the continuity of probability in Appendix 2.1.

1.5.1 Upper and Lower Limits of Sequences

Let us first consider the limits of sequences of numbers before addressing the limits
of sequences of sets. When ai ≤ u and ai ≥ v for every choice of a number ai from
the set A of real numbers, the numbers u and v are called an upper bound and a lower
bound, respectively, of A.

Definition 1.5.1 (least upper bound; greatest lower bound) For a subset A of real
numbers, the smallest among the upper bounds of A is called the least upper bound
of A and is denoted by sup A, and the largest among the lower bounds of A is called
the greatest lower bound and is denoted by inf A.

For a sequence {xn }∞


n=1 of real numbers, the least upper bound and greatest lower
bound are written as

sup xn = sup {xn : n ≥ 1} (1.5.1)


n≥1

and

inf xn = inf {xn : n ≥ 1} , (1.5.2)


n≥1

respectively. When there exists no upper bound and no lower bound of A, it is denoted
by sup A → ∞ and inf A → −∞, respectively.

Example 1.5.1 For the set A = {1, 2, 3}, the least upper bound is sup A = 3 and
the greatest lower bound is inf A = 1. For the sequence {an } = {0, 1, 0, 1, . . .}, the
least upper bound is sup an = 1 and the greatest lower bound is inf an = 0. ♦

Definition 1.5.2 (limit superior; limit inferior) For a sequence {xn }∞


n=1 ,

lim sup xn = xn
n→∞
= inf sup xk (1.5.3)
n≥1 k≥n

and
56 1 Preliminaries

lim inf xn = xn
n→∞
= sup inf xk (1.5.4)
n≥1 k≥n

are called the limit superior and limit inferior, respectively.


√ √
Example 1.5.2 For the sequences xn = √1n 4n + 3 + (−1)n 4n − 5 , yn =
 
√5 + (−1)n , and z n = 3 sin nπ , we have x n = 4 and x n = 0, yn = 1 and yn = −1,
n 2
and z n = 3 and z n = −3. ♦

Definition 1.5.3 (limit) For a sequence {xn }∞


n=1 , lim x n = y and lim sup x n =
n→∞ n→∞
lim inf xn = y are the necessary and sufficient conditions of each other, where y
n→∞
is called the limit of the sequence {xn }∞
n=1 .

Example 1.5.3 For the three sequences


√
2, n = 1,
xn = √ (1.5.5)
2 + xn−1 , n = 2, 3, . . . ,
⎧√
⎨ n, n = 1, 2, . . . 9,
yn = 3, n = 10, 11, . . . 100, (1.5.6)
⎩ 3n
5n+4
, n = 101, 102, . . . ,

and
1 √ √ 
zn = 4n + 3 + (−1)n 4n − 5 , (1.5.7)
2n

we have19 xn = xn = lim xn = 2, yn = yn = lim yn = 35 , and z n = z n =


n→∞ n→∞
lim z n = 0. ♦
n→∞

Example 1.5.4 None of the three sequences xn , yn , and z n considered in Example


1.5.2 has a limit because the limit superior is different from the limit inferior. ♦

Fig. 1.24 Increasing


sequence {Bn }∞
n=1 :
Bn+1 ⊃ Bn
B1 B2 B3 ···


19 The value xn = 2 can be obtained by solving the equation xn = 2 + xn .
1.5 Limits of Sequences of Sets 57

1.5.2 Limit of Monotone Sequence of Sets

Definition 1.5.4 (increasing sequence; decreasing sequence; non-decreasing


sequence; non-increasing sequence) For a sequence {Bn }∞ n=1 of sets, assume that
Bn+1 ⊇ Bn for every natural number n. If Bn+1 = Bn for at least one n, then the
sequence is called a non-decreasing sequence; otherwise, it is called an increasing
sequence. A non-increasing sequence and a decreasing sequence are defined similarly
by replacing Bn+1 ⊇ Bn with Bn+1 ⊆ Bn .
  ∞
Example 1.5.5 The sequences 1, 2 − n1 n=1 and {(−n, a)}∞ are increasing
 
1 ∞
  n=1
1 ∞
sequences. The sequences 1, 1 + n n=1 and 1 − n , 1 + n n=1 are decreasing
1

sequences. ♦

Increasing and decreasing sequences are sometimes referred to as strictly increas-


ing and strictly decreasing sequences, respectively. A non-decreasing sequence or
a non-increasing sequence is also called a monotonic sequence. Although increas-
ing and non-decreasing sequences are slightly different from each other, they are
used interchangeably when it does not cause an ambiguity. Similarly, decreasing and
non-increasing sequences will be often used interchangeably. Figures 1.24 and 1.25
graphically show increasing and decreasing sequences, respectively.

Definition 1.5.5 (limit of monotonic sequence) We call



lim Bn = ∪ Bi (1.5.8)
n→∞ i=1

for a non-decreasing sequence {Bn }∞


n=1 and


lim Bn = ∩ Bi (1.5.9)
n→∞ i=1

for a non-increasing sequence {Bn }∞ ∞


n=1 the limit set or limit of {Bn }n=1 .

The limit lim Bn denotes the set of points contained in at least one of {Bn }∞
n=1
n→∞
and in every set of {Bn }∞ ∞
n=1 when {Bn }n=1 is a non-decreasing and a non-increasing
sequence, respectively.

Fig. 1.25 Decreasing


sequence {Bn }∞
n=1 :
Bn+1 ⊂ Bn
B1 B2 B3 · · ·
58 1 Preliminaries
  ∞
Example 1.5.6 The sequence 1, 2 − n1 n=1 considered in Example 1.5.5 has the
∞    
limit lim Bn = ∪ 1, 2 − n1 = [1, 1) ∪ 1, 23 ∪ · · · or
n→∞ n=1

lim Bn = [1, 2) (1.5.10)


n→∞

  ∞
because 1, 2 − n1 n=1
is a non-decreasing sequence. Likewise, the limit of the

non-decreasing sequence {(−n, a)}∞
n=1 is lim Bn = ∪ (−n, a) = (−∞, a). ♦
n→∞ n=1
  ∞
Example 1.5.7 The sequence a, a + n1 n=1
is a non-increasing sequence and has
∞  
the limit lim Bn = ∩ a, a + n1 or
n→∞ n=1

lim Bn = [a, a] , (1.5.11)


n→∞

  ∞
which is a singleton set {a}. The non-increasing sequence 1 − n1 , 1 + n1 n=1 has
∞    
the limit lim Bn = ∩ 1 − n1 , 1 + n1 = (0, 2) ∩ 21 , 23 ∩ · · · = [1, 1], also a sin-
n→∞ n=1
gleton set. Note that
! 
1
lim 0, = {0} (1.5.12)
n→∞ n

is different from
! 
1
0, lim =∅ (1.5.13)
n→∞ n

in (1.5.11). ♦

Example 1.5.8 Consider the set S = {x : 0 ≤ x ≤ 1}, sequence {Ai }i=1 with

Ai = x : i+1 < x ≤ 1 , and sequence {Bi }i=1 with Bi = x : 0 < x < 1i .
1

Then, because {Ai }i=1 is a non-decreasing sequence, we have lim An = x : 21
n→∞
< x ≤ 1} ∪ x : 1
3
< x ≤ 1 ∪ · · · = {x : 0 < x ≤ 1} and S = {0} ∪ lim An .
n→∞

Similarly, because {Bi }i=1 is a non-increasing sequence, we have lim Bn =
n→∞
{x : 0 < x < 1} ∩ x : 0 < x < 21 ∩ · · · = {x : 0 < x ≤ 0} = ∅. ♦
  ∞   ∞
Example 1.5.9 The sequences 1 + n1 , 2 n=1 and 1 + n1 , 2 n=1 of interval sets
are both non-decreasing
 sequences
 ∞  with thelimits (1, 2) and (1, 2], respectively. The

sequences a, a + n1 n=1 and a, a + n1 n=1 are both non-increasing sequences
with the limits (a, a] = ∅ and [a, a] = {a}, respectively. ♦
  ∞   ∞
Example 1.5.10 The sequences 1 − n1 , 2 n=1 and 1 − n1 , 2 n=1 are both non-
increasing sequences with the limits [1, 2) and [1, 2], respectively. The sequences
1.5 Limits of Sequences of Sets 59
  ∞   ∞
1, 2 − n1 n=1 and 1, 2 − n1 n=1 are both non-decreasing sequences with the
limits (1, 2) and [1, 2), respectively. ♦
  ∞   ∞
Example 1.5.11 The sequences 1 + n1 , 3 − n1 n=1 and 1 + n1 , 3 − n1 n=1 are
both non-decreasing sequences
 with the common limit (1, 3). Similarly, the non-
∞ ∞
decreasing sequences 1 + n1 , 3 − n1 n=1 and 1 + n1 , 3 − n1 n=1 both have the
limit20 (1, 3). ♦
  ∞   ∞
Example 1.5.12 The four sequences 1 − n1 , 3 + n1 n=1 , 1 − n1 , 3 + n1 n=1 ,
  ∞   ∞
1 − n1 , 3 + n1 n=1 , and 1 − n1 , 3 + n1 n=1 are all non-increasing sequences with
the common limit21 [1, 3]. ♦

1.5.3 Limit of General Sequence of Sets

We have discussed the limits of monotonic sequences in Sect. 1.5.2. Let us now
consider the limits of general sequences. First, note that any element in a set of an
infinite sequence belongs to
(1) every set,
(2) every set except for a finite number of sets,
(3) infinitely many sets except for other infinitely many sets, or
(4) a finite number of sets.
Keeping these four cases in mind, let us define the lower bound and upper bound
sets of general sequences.

Definition 1.5.6 (lower bound set) For a sequence of sets, the set of elements belong-
ing to at least almost every set of the sequence is called the lower bound or lower
bound set of the sequence, and is denoted by22 lim inf or by lim .
n→∞ n→∞

Let us express the lower bound set lim inf Bn = Bn of the sequence {Bn }∞
n=1 in
n→∞
terms of set operations. First, note that

G i = ∩ Bk (1.5.14)
k=i

20 In short, irrespective of the type of the parentheses, the limit is in the form of ‘(a, . . .’ for an

interval when the beginning point is of the form a + n1 , and the limit is in the form of ‘. . . , b)’ for
an interval when the end point is of the form b − n1 .
21 In short, irrespective of the type of the parentheses, the limit is in the form of ‘[a, . . .’ for an

interval when the beginning point is of the form a − n1 , and the limit is in the form of ‘. . . , b]’ for
an interval when the end point is of the form b + n1 .
22 The acronym inf stands for infimum or inferior.
60 1 Preliminaries

is the set of elements belonging to Bi , Bi+1 , . . .: in other words, G i is the set of


elements belonging to all the sets of the sequence except for at most (i − 1) sets,
possibly B1 , B2 , . . ., and Bi−1 . Specifically, G 1 is the set of elements belonging to
all the sets of the sequence, G 2 is the set of elements belonging to all the sets except
possibly for the first set, G 3 is the set of elements belonging to all the sets except
possibly for the first and second sets, . . .. This implies that an element belonging to

any of {G i }i=1 is an element belonging to almost every set of the sequence. Therefore,

if we collect all the elements belonging to at least one of {G i }i=1 , or if we take the

union of {G i }i=1 , the result would be the set of elements in every set except for a
finite number of sets. In other words, the set of elements belonging to at least almost
every set of the sequence {Bn }∞ n=1 , or the lower bound set lim inf Bn of {Bn }∞
n=1 , can
n→∞
be expressed as
∞ ∞
lim inf Bn = ∪ ∩ Bk , (1.5.15)
n→∞ i=1 k=i

which is sometimes denoted by {eventually Bn } or {ev. Bn }.


Example 1.5.13 For the sequence

{Bn }∞
n=1 = {0, 1, 3}, {0, 2}, {0, 1, 2}, {0, 1}, {0, 1, 2}, {0, 1}, . . . (1.5.16)

of finite sets, obtain the lower bound set.


Solution First, 0 belongs to all sets, 1 belongs to all sets except for the second set, and
2 and 3 do not belong to infinitely many sets. Thus, the lower bound of the sequence
∞ ∞ ∞
is {0, 1}, which can be confirmed by lim inf Bn = ∪ ∩ Bk = ∪ G i = {0, 1} using
n→∞ i=1 k=i i=1
∞ ∞ ∞ ∞
G 1 = ∩ Bk = {0}, G 2 = ∩ Bk = {0}, G 3 = ∩ Bk = {0, 1}, G 4 = ∩ Bk = {0, 1},
k=1 k=2 k=3 k=4
. . .. ♦
Definition 1.5.7 (upper bound set) For a sequence of sets, the set of elements belong-
ing to infinitely many sets of the sequence is called the upper bound or upper bound
set of the sequence, and is denoted by23 lim sup or by lim .
n→∞ n→∞

Because an element belonging to almost every belongs


 Bnc
to a finite
c number of
∞ ∞
Bn and the converse is also true, we have lim sup Bn = ∪ ∩ Bk , i.e.,
c
n→∞ i=1 k=i

∞ ∞
lim sup Bn = ∩ ∪ Bk , (1.5.17)
n→∞ i=1 k=i

which is alternatively written as lim sup Bn = {infinitely often Bn } with ‘infinitely


n→∞
often’ also written as i.o. It is noteworthy that

23 The acronym sup stands for supremum or superior.


1.5 Limits of Sequences of Sets 61

{finitely often Bn } = {infinitely often Bn }c


∞ ∞
= ∪ ∩ Bkc , (1.5.18)
i=1 k=i

where ‘finitely often’ is often written as f.o.


Example 1.5.14 Obtain the upper bound set for the sequence {Bn } = {0, 1, 3},
{0, 2}, {0, 1, 2}, {0, 1}, {0, 1, 2}, {0, 1}, . . . considered in Example 1.5.13.
Solution Because 0, 1, and 2 belong to infinitely many sets and 3 belongs to one
set, the upper bound set is {0, 1, 2}. This result can be confirmed as lim sup Bn =
n→∞
∞ ∞ ∞ ∞
∩ ∪ Bk = ∩ Hi = {0, 1, 2} by noting that H1 = ∪ Bk = {0, 1, 2, 3}, H2 =
i=1 k=i i=1 k=1
∞ ∞ ∞
∪ Bk = {0, 1, 2}, H3 = ∪ Bk = {0, 1, 2}, H4 = ∪ Bk = {0, 1, 2}, . . .. Simi-
k=2 k=3 k=4
larly, assuming Ω = {0, 1, 2, 3, 4} for example, we have B1c = {2, 4}, B2c =
{1, 3, 4}, B3c = {3, 4}, B4c = {2, 3, 4}, B5c = {3, 4}, B6c = {2, 3, 4}, . . ., and thus
∞ ∞ ∞
∩ Bkc = {4}, ∩ Bkc = {3, 4}, ∩ Bkc = {3, 4}, . . .. Therefore, the upper bound can
k=1 k=2 k=3  c
∞ ∞
be obtained also as lim sup Bn = ∪ ∩ Bk = ({4} ∪ {3, 4} ∪ {3, 4} ∪ · · · )c =
c
n→∞ i=1 k=i
{3, 4}c = {0, 1, 2}. ♦
Let us note that in an infinite sequence of sets, any element belonging to almost
every set belongs to infinitely many sets and thus

lim inf Bn ⊆ lim sup Bn (1.5.19)


n→∞ n→∞

is always true. On the other hand, as we mentioned before, an element belonging to


infinitely many sets may or may not belong to the remaining infinitely many sets: for
example, we can imagine an element belonging to all the odd-numbered sets but not
in any even-numbered set. Consequently, an element belonging to infinitely many
sets does not necessarily belong to almost every set. In short, in some cases we have

lim sup Bn  lim inf Bn , (1.5.20)


n→∞ n→∞

which, together with (1.5.19), confirms the intuitive observation that the upper bound
is not smaller than the lower bound.
In Definition 1.5.5, we addressed the limit of monotonic sequences. Let us now
extend the discussion to the limits of general sequences of sets.
Definition 1.5.8 (convergence of sequence; limit set) If

lim sup Bn ⊆ lim inf Bn (1.5.21)


n→∞ n→∞

holds true for a sequence {Bn }∞


n=1 of sets, then
62 1 Preliminaries

lim sup Bn = lim inf Bn


n→∞ n→∞

=B (1.5.22)

from (1.1.2) using (1.5.19) and (1.5.21). In such a case, the sequence {Bn }∞
n=1 is
called to converge to B, which is denoted by Bn → B or lim Bn = B. The set B is
n→∞
called the limit set or limit of {Bn }∞
n=1 .

The limit of monotonic sequences described in Definition 1.5.5 is in agreement


with Definition 1.5.8: let us confirm this fact. Assume {Bn }∞
n=1 is a non-decreasing
n ∞ n
sequence. Because ∩ Bk = Bi for any i, we have ∩ Bk = lim ∩ Bk = lim Bi =
k=i k=i n→∞ k=i n→∞
∞ ∞
Bi , with which we get lim inf Bn = ∪ ∩ Bk as
n→∞ i=1 k=i


lim inf Bn = ∪ Bi
n→∞ i=1
= lim Bn . (1.5.23)
n→∞

∞ n n
We also have ∪ Bk = lim ∪ Bk = lim Bn from ∪ Bk = Bn for any value of i.
k=i n→∞ k=i n→∞ k=i
∞ ∞ ∞
Thus, we have lim sup Bn = ∩ ∪ Bk = ∩ lim Bn , consequently resulting in
n→∞ i=1 k=i i=1 n→∞

lim sup Bn = lim Bn


n→∞ n→∞

= lim inf Bn . (1.5.24)


n→∞

∞ ∞
Next, assume {Bn }∞
n=1 is a non-increasing sequence. Then, lim inf Bn = ∪ ∩ Bk =
n→∞ i=1 k=i
∞ ∞ n
∪ lim Bn = lim Bn because ∩ Bk = lim Bn from ∩ Bk = Bn for any i. We also
i=1 n→∞ n→∞ k=i n→∞ k=i
∞ ∞ ∞ ∞ n
have lim sup Bn = ∩ ∪ Bk = ∩ Bi = lim Bn because ∪ Bk = lim ∪ Bk = Bi
n→∞ i=1 k=i i=1 n→∞ k=i n→∞ k=i
n
from ∪ Bk = Bi for any i.
k=i
  ∞
Example 1.5.15 Obtain the limit of {Bn }∞
n=1 = 1 − n1 , 3 − n1 n=1 .
     
Solution First, because B1 = (0, 2), B2 = 21 , 25 , B3 = 23 , 83 , B4 = 43 , 11
4
, . . .,
∞ ∞  5 ∞
we have G 1 = ∩ Bk = [1, 2), G 2 = ∩ Bk = 1, 2 , · · · and H1 = ∪ Bk = (0, 3),
k=1 k=2 k=1
∞ 1  ∞
H2 = ∪ Bk = 2 , 3 , · · · . Therefore, the lower bound is lim inf Bn = ∪ G i =
k=2 n→∞ i=1

[1, 3), the upper bound is lim sup Bn = ∩ Hi = [1, 3), and the limit is lim Bn =
n→∞ i=1 n→∞
1.5 Limits of Sequences of Sets 63

[1,
 3). 1We can1  similarly
 show  all1 [1, 3)
that the limits are 24
 for the sequences
∞ 1 ∞ ∞
1 − n , 3 − n n=1 , 1 − n , 3 − n n=1 , and 1 − n , 3 − n1 n=1 .
1

  ∞
Example 1.5.16 Obtain the limit of the sequence 1 + n1 , 3 + n1 n=1 of intervals.
     
Solution First, because B1 = (2, 4), B2 = 23 , 27 , B3 = 43 , 10
3
, B4 = 54 , 13
4
, . . .,
∞ ∞ 3  ∞
we have G 1 = ∩ Bk = (2, 3], G 2 = ∩ Bk = 2 , 3 , · · · and H1 = ∪ Bk = (1, 4),
k=1 k=2 k=1
∞  7 ∞
H2 = ∪ Bk = 1, 2 , · · · . Thus, the lower bound is lim inf Bn = ∪ G i = (1, 3], the
k=2 n→∞ i=1

upper bound is lim sup Bn = ∩ Hi = (1, 3], and the limit set is lim Bn = (1, 3].
i=1 n→∞
n→∞   ∞
We can similarly show that the limits of the sequences 1 + n1 , 3 + n1 n=1 ,
  ∞   ∞
1 + n1 , 3 + n1 n=1 , and 1 + n1 , 3 + n1 n=1 are all (1, 3]. ♦

Appendices

Appendix 1.1 Binomial Coefficients in the Complex Space

For the factorial n! = n(n − 1) · · · 1 defined in (1.4.57), the number n is a natural


number. Based on the gamma function addressed in Sect. 1.4.3, we extend the fac-
torial into the complex space, which will in turn be used in the discussion of the
permutation and binomial coefficients (Riordan 1968; Tucker 2002; Vilenkin 1971)
in the complex space.

(A) Factorials and Permutations in the Complex Space

Recollecting that

α! = Γ (α + 1) (1.A.1)

from Γ (α + 1) = αΓ (α) shown in (1.4.75), the factorial p! can be expressed as



±∞, p ∈ J− ,
p! = (1.A.2)
Γ ( p + 1), p ∈
/ J−

for a complex number p, where J− = {−1, −2, . . .} denotes the set of negative
integers. Therefore, 0! = Γ (1) = 1 for p = 0.

24 As it is mentioned in Examples 1.5.11 and 1.5.12, when the lower end value of an interval is
in the form of a − n1 and a + n1 , the limit is in the form of ‘[a, . . .’ and ‘(a, . . .’, respectively. In
addition, when the upper end value of an interval is in the form of b + n1 and b − n1 , the limit is in
the form of ‘. . . , b]’ and ‘. . . , b)’, respectively, for both open and closed ends.
64 1 Preliminaries
 
Example 1.A.1 From (1.A.2), it is easy to see that (−2)! = ±∞ and that − 21 ! =
1 √
Γ 2 = π from (1.4.83). ♦

For the permutation n Pk = (n−k)!


n!
defined in (1.4.59), it is assumed that n is a non-
negative integer and k = 0, 1, . . . , n. Based on (1.4.92) and (1.A.2), the permutation
Γ ( p+1)
p Pq = Γ ( p−q+1) can now be generalized as

⎧ Γ ( p+1)

⎪ , p ∈
/ J− and p−q ∈
/ J− ,
⎨ Γ ( p−q+1)
(−1)q Γ Γ(−(−p+q) , p ∈ J− and p−q ∈ J− ,
p Pq = p) (1.A.3)

⎪ ∈
/ J− p−q ∈ J− ,
⎩ 0, p and
±∞, p ∈ J− and p−q ∈
/ J−

for complex numbers p and q, where the expression (−1)q Γ Γ(−(−p+q) p)


in the second
line of the right-hand side can also be written as (−1)q − p+q−1 Pq .

Example 1.A.2 For any number z, z P0 = 1 and z P1 = z. ♦

Example 1.A.3 It follows that



0, z is a natural number,
0 Pz = (1.A.4)
1
Γ (1−z)
, otherwise,


0, z = 2, 3, . . . ,
1 Pz = (1.A.5)
1
Γ (2−z)
, otherwise,


±∞, z ∈ J− ,
z Pz = (1.A.6)
/ J− ,
Γ (z + 1), z ∈

and

(−1)z z Pz , z = 0, 1 . . . ,
−1 Pz =
±∞, otherwise

(−1)z Γ (z + 1), z = 0, 1 . . . ,
= (1.A.7)
±∞, otherwise

from (1.A.3). ♦

Using (1.A.3), we can also get −2 P−0.3 = ±∞, −0.1 P1.9 = 0, 0 P3 = ΓΓ(−2) (1)
= 0,

Γ(2)
3
Γ(2)
3
π Γ (4) Γ (4)
1 P3 = = 8 , 21 P0.8 = Γ (0.7) = 2Γ (0.7) , 3 P 21 = Γ 7 = 5√π , 3 P− 21 = Γ 9 =
3 16
2 Γ (− 23 ) (2) (2)
Γ (4) Γ ( 21 ) Γ (−1) Γ (5)
32
√ , P
35 π 3 −2
= Γ (6) = 20 , − 21 P3 = Γ − 5 = − 15 , and −2 P3 = Γ (−4) = (−1) Γ (2)
1 8
=
( 2)
−24. Table 1.3 shows some values of the permutation p Pq .
Appendices 65

Table 1.3 Some values of the permutation p Pq (Here, ∗ denotes ±∞)


q
−2 − 23 −1 − 21 0 1
2 1 3
2 2
p −2 ∗ ∗ −1 ∗ 1 ∗ −2 ∗ 6

− 23 −4 −2 π −2 0 1 0 − 23 0 15
4
−1 ∗ ∗ ∗ ∗ 1 ∗ −1 ∗ 2
√ √
− 21 4
3 π 2 π 1 0 − 21 0 3
4
0 1 4
√ 1 √2 1 √1 0 − 2√1 π 0
2 √ π
3 √π √π
π π π
1
2
4
15 4
2
3 2 1 2
1
2 0 − 41
1 8√ 1 √4 √2 √1
1 6 15 π 2 3√π
1 π
1 π
0
√ √ √
3 4 π 2 3 π 3 π 3 3 π 3
2 35 8 5 8 1 4 2 4 4
1 32√ 1 16
√ √8 √4
2 12 105 π 3 15 π
1 3 π
2 π
2

(B) Binomial Coefficients in the Complex Space

For the binomial coefficient n Ck = (n−k)!k!


n!
defined in (1.4.60), n and k are non-
negative integers with n ≥ k. Based on the gamma function described in Sect. 1.4.3,
we can define the binomial coefficient in the complex space: specifically, employ-
Γ ( p+1)
ing (1.4.92), the binomial coefficient p Cq = Γ ( p−q+1)Γ (q+1)
for p and q complex
numbers can be defined as described in Table 1.4.

Example 1.A.4 When both p and p − q are negative integers and q is a non-negative
integer, the binomial coefficient p Cq = (−1)q − p+q−1 Cq can be expressed also as

Γ ( p+1)
Table 1.4 The binomial coefficient p Cq = Γ ( p−q+1)Γ (q+1) in the complex space
Is p ∈ J− ? Is q ∈ J− ? Is p − q ∈ J− ? p Cq
Γ ( p+1)
No No No Γ ( p−q+1)Γ (q+1)
Γ (−q)
Yes Yes No (−1) p−q Γ ( p−q+1)Γ (− p)
= (−1) p−q −q−1 C p−q
Γ (− p+q)
Yes No Yes (−1)q Γ (− p)Γ (q+1)
= (−1) − p+q−1 Cq
q

Yes Yes Yes


No Yes No 0
No No Yes
Yes No No ±∞
Note. Among the three numbers p, q, and p − q, it is possible that
only p − q is not a negative integer (e.g., p = −2, q = −3, and p − q = 1)
and only q is not a negative integer (e.g., p = −3, q = 2, and p − q = −5),
but it is not possible that only p is not a negative integer.
In other words, when q and p − q are both negative integers, p is also a negative integer.
66 1 Preliminaries

p Cq = (−1)q − p+q−1 C− p−1 . (1.A.8)

Now, when p is a negative non-integer real number and q is a non-negative


integer, the binomial coefficient can be written as p Cq = Γ Γ( p−q+1)
( p+1) 1
Γ (q+1)
=
(−1) p− p+q Γ Γ(−(−p+q) 1
p) Γ (q+1)
= (−1)q (− p+q−1)!
(− p−1)!q!
or as

p Cq = (−1)q − p+q−1 Cq (1.A.9)

by recollecting ΓΓ (α+1)
(β+1)
= (−1)α−β ΓΓ (−α)
(−β)
shown in (1.4.91) for α − β an integer,
α < 0, and β < 0. The two formulas (1.A.8) and (1.A.9) are the same as −r Cx =
(−1)x r +x−1 Cx , which we will see in (2.5.15) for a negative real number −r and a
non-negative integer x. ♦

Example 1.A.5 From Table 1.4, we promptly get z C0 = z Cz = 1 and z C1 =


z Cz−1 = z. In addition, 0 Cz = 0 C−z and 1 Cz = 1 C1−z = Γ (2−z)Γ (1+z) can be
1

expressed as

1
0 Cz =
Γ (1 − z)Γ (1 + z)

⎨ 1, z = 0,
= 0, z = ±1, ±2, . . . , (1.A.10)
⎩ 1
, otherwise
Γ (1−z)Γ (1+z)

and

⎨ 1, z = 0, 1,
1 Cz = 0, z = −1, ±2, ±3, . . . , (1.A.11)
⎩ 1
, otherwise,
Γ (2−z)Γ (1+z)

respectively. ♦
(−3)!
We can similarly obtain25 −3 C−2 = −3 C−1 =(−2)!(−1)!
= 0, −3 C2 = −3 C−5 =
(−3)! (−7)!
(−5)!2!
= (−1)2 2!2!
4!
= 4 C2 , −7 C3 = −7 C−10 = (−10)!3!
= (−1) 6!3!
9!
= −9 C3 , and
Γ ( 27 )
5 C2 = 5 C 1 = = 15
2 Γ (3)Γ ( 23 ) 8
. Table 1.5 shows some values of the binomial
2 2
coefficient.

Example 1.A.6 Obtain the series expansion of h(z) = (1 + z) p for p a real number.

Solution First, when p ≥ 0 or when p < 0 and |z| < 1, we have (1 + z) p =



∞ (k)
h (0) k ∞
k!
z = 1
k!
p( p − 1) · · · ( p − k + 1)z k , i.e.,
k=0 k=0

25 The cases −1 Cz and −2 Cz are addressed in Exercise 1.39.


Appendices 67

Table 1.5 Values of binomial coefficient p Cq (Here, ∗ denotes ±∞)


q
−2 − 23 −1 − 21 0 1
2 1 3
2 2
p −2 1 ∗ 0 ∗ 1 ∗ −2 ∗ 3
− 23 0 1 0 0 1 0 − 23 0 15
8
−1 −1 ∗ 1 ∗ 1 ∗ −1 ∗ 1
− 21 0 − 21 0 1 1 0 − 21 0 3
8
0 0 − 3π2
0 2
π 1 2
π 0 − 3π
2
0
1
2 0 −81
0 1
2 1 1 1
2 0 − 18
1 0 − 15π4
0 4
3π 1 4
π 1 4
3π 0
3
2 0 − 16
1
0 3
8 1 3
2
3
2 1 3
8
2 0 − 105π
16
0 16
15π 1 16
3π 2 16
3π 1



(1 + z) p = p Ck z
k
. (1.A.12)
k=0

Note that (1.A.12) is the same as the binomial expansion


p
(1 + z) p = p Ck z
k
(1.A.13)
k=0

of (1 + z) p because p Ck = 0 for k = p + 1, p + 2, . . . when p is 0 or a natural


 p
number. Next, recollecting (1.A.12) and (1 + z) p = z p 1 + 1z , we get



(1 + z) p = p Ck z
p−k
(1.A.14)
k=0

for p < 0 and |z| > 1. Combining (1.A.12) and (1.A.14), we eventually get
⎧ ∞
⎪ 

⎨ p Ck z ,
k
for p ≥ 0 or for p < 0, |z| < 1,
(1 + z) p
= k=0
∞ (1.A.15)

⎪ p−k
, for p < 0, |z| > 1,
⎩ p Ck z
k=0

(1 + z) p = 2 p for p < 0 and z = 1, and (1 + z) p → ∞ for p < 0 and z = −1.


Note that the term p Ck in (1.A.15) is always finite because the case of only p being
a negative integer among p, k, and p − k is not possible when k is an integer. ♦

Example 1.A.7 Because −1 Ck = (−1)k for k = 0, 1, . . . as shown in (1.E.27), we


get
68 1 Preliminaries
⎧ ∞
⎪ 

⎨ (−1)k z k , |z| < 1,
1
= k=0
∞
1+z ⎪

⎩ (−1)k z −1−k , |z| > 1
k=0

1 − z + z 2 − z 3 + · · · , |z| < 1,
= 1 (1.A.16)
z
− z12 + z13 − · · · , |z| > 1

from (1.A.15) with p = −1. ♦


Example 1.A.8 Employing (1.A.15) and the result for −2 Ck shown in (1.E.28), we
get
⎧ ∞
⎪ 

⎨ (−1)k (k + 1)z k , |z| < 1,
1
= k=0


(1 + z)2 ⎪

⎩ (−1)k (k + 1)z −2−k , |z| > 1
k=0

1 − 2z + 3z 2 − 4z 3 + · · · , |z| < 1,
= 1 (1.A.17)
z2
− z23 + z34 − · · · , |z| > 1.

∞ 
 k  
Alternatively, from 1
(1+z)2
= 1
1−(−2z−z 2 )
= −2z − z 2 for −2z − z 2  < 1, we
k=0
have
1    
= 1 + −2z − z 2 + 4z 2 + 4z 3 + z 4
(1 + z)2
 
+ −8z 3 + · · · + · · · , (1.A.18)

which can be rewritten as


   
1 − 2z + −z 2 + 4z 2 + 4z 3 − 8z 3 + · · ·
= 1 − 2z + 3z 2 − 4z 3 + · · · (1.A.19)

by changing the order in the addition. The result26 (1.A.19) is the same as (1.A.17)
for |z| < 1.

(C) Two Equalities for Binomial Coefficients

Theorem 1.A.1 For γ ∈ {0, 1, . . .} and any two numbers α and β, we have
  √ 
26 In writing (1.A.19) from (1.A.18), we assume −2z − z 2  < 1, 0 < |Re(z) + 1| ≤ 2 , a
  √ 
proper subset of the region |z| < 1. Here, −2z − z 2  < 1, 0 < |Re(z) + 1| ≤ 2 is the right
 
half of the dumbbell-shaped region −2z − z 2  < 1, which is a proper subset of the rectangle
 √ 
|Im(z)| ≤ 21 , |Re(z) + 1| ≤ 2 .
Appendices 69

γ     
 α β α+β
= , (1.A.20)
m=0
γ−m m γ

which is called Chu-Vandermonde convolution or Vandermonde convolution.


Theorem 1.A.1 is proved in Exercise 1.35. The result (1.A.20) is the same as the
Hagen-Rothe identity

γ      
α − γc α − mc β β + mc α + β − γc α + β
= (1.A.21)
m=0
α − mc γ − m β + mc m α+β γ

with c = 0 and Gauss’ hypergeometric theorem

Γ (c)Γ (c − a − b)
2 F1 (a, b; c; 1) = , Re(c) > Re(a + b) (1.A.22)
Γ (c − a)Γ (c − b)

with a = γ, b = α + β − γ, and c = α + β + 1. In (1.A.22),

∞
(a)n (b)n z n
2 F1 (a, b; c; z) = , Re(c) > Re(b) > 0 (1.A.23)
n=0
(c)n n!

is the hypergeometric function27 , and can be expressed also as


 1
1
2 F1 (a, b; c; z) = x b−1 (1 − x)c−b−1 (1 − zx)−a d x (1.A.24)
B(b, c − b) 0

in terms of Euler’s integral formula.


Example 1.A.9 In (1.A.20), assume α = 2, β = 21 , and γ = 2. Then, the left-hand
  1    1    1 
side is 22 02 + 21 12 + 02 22 = 1 + 2 −( 21 )!1! + −( 23 )!2! = 1 + 1 + ( 2 )(2! 2 ) = 15
1
! 1
! 1
−1
( 2) ( 2) 8
5 ( 5
)!
and the right-hand side is 22 = 1 !2! = 8 .
2 15

(2)
n 
 α  β 
Example 1.A.10 Consider the case β = n for γ−m m
, where α is not a neg-
m=0
n 
 α  β 
ative integer and n ∈ {0, 1, . . .}. When γ = 0, 1, . . . , n, we have γ−m m
m=0

27 The function 2 F1 is also called Gauss’ hypergeometric function, and a special case of the gener-
alized hypergeometric function

 (α1 )k (α2 )k · · · (α p )k z k
p Fq (α1 , α2 , . . . , α p ; β1 , β2 , . . . , βq ; z) = .
(β1 )k (β2 )k · · · (βq )k k!
k=0
  π
Also, note that 2 F1 1, 1; 23 ; 21 = 2.
70 1 Preliminaries

γ 
 α  β    α  β  α+β   α 
n
α!
= γ−m m
+ γ−m m
= γ
noting that γ−m
= (α−γ+m)!
m=0 m=γ+1
1
(γ−m)!
= 0 for m = γ + 1, γ + 2, . . . due to (γ − m)! = ±∞. Similarly, when γ =
n 
α  β 
γ 
  α  β    α  β  α+β 
γ
n + 1, n + 2, . . ., we have γ−m m
= γ−m m
− γ−m m
= γ
    m=0 m=0 m=n+1
because mβ = mn = (n−m)!m! n!
= 0 from (n − m)! = ±∞ for m = n + 1, n + 2, . . ..
In short, we have

n     
α n α+n
= (1.A.25)
m=0
γ−m m γ

for γ ∈ {0, 1, . . .} when n ∈ {0, 1, . . .}. ♦

Theorem 1.A.2 We have


γ
     
α β α+β−ζ
[m]ζ = [β]ζ (1.A.26)
m=0
γ−m m γ−ζ

for ζ ∈ {0, 1, . . . , γ} and γ ∈ {0, 1, . . .}.

Proof (Method 1) Let us employ the mathematical induction. First, when ζ = 0,


(1.A.26) holds true for any values of γ ∈ {0, 1, . . .}, α, and β from Theorem 1.A.1.
Assume (1.A.26) holds true when ζ = ζ0 : in other words, for any value of γ ∈
{0, 1, . . .}, α, and β, assume
γ
     γ   
α β α β
[m]ζ0 = [m]ζ0
m=0
γ−m m γ−m m
m=ζ0
 
α + β − ζ0
= [β]ζ0 (1.A.27)
γ − ζ0
 β   
holds true. Then, noting that (m + 1) m+1 = β β−1
m
and [m + 1]ζ0 +1 = (m + 1)
γ  α  β  
γ  α  β  
γ−1
[m]ζ0 , we get [m]ζ0 +1 γ−m m = [m]ζ0 +1 γ−m m = [m + 1]ζ0 +1
m=0 m=ζ0 +1 m=ζ0
 α
 β
 
γ−1  α
  β

γ−m−1 m+1
= [m]ζ0 γ−1−m
(m + 1) m+1
from (1.A.27), i.e.,
m=ζ0
Appendices 71

γ
    γ−1
   
α β α β−1
[m]ζ0 +1 =β [m]ζ0
m=0
γ−m m γ−1−m m
m=ζ0
 
α + β − 1 − ζ0
= β[β − 1]ζ0
γ − 1 − ζ0
 
α + β − (ζ0 + 1)
= [β]ζ0 +1 . (1.A.28)
γ − (ζ0 + 1)

The result (1.A.28) implies that (1.A.26) holds true also when ζ = ζ0 + 1 if (1.A.26)
holds true when ζ = ζ0 . In short, (1.A.26) holds true for ζ ∈ {0, 1, . . .}.  
(Method 2) Noting (Charalambides 2002; Gould 1972) that [m]ζ mβ =
[β] [β−ζ]m−ζ  β−ζ  γ 
 α
 β−ζ  α+β−ζ 
[m]ζ [m]ζ ζ (m−ζ)! = [β]ζ m−ζ , we can rewrite (1.A.26) as γ−m m−ζ
= γ−ζ ,
m=ζ

γ−ζ
α
β−ζ  α+β−ζ 
which is the same as the Chu-Vandermonde convolution γ−ζ−k k
= γ−ζ
.
k=0

It is noteworthy that (1.A.26) holds true also when ζ ∈ {γ + 1, γ + 2, . . .}, in


which case the value of (1.A.26) is 0.

Example 1.A.11 Assume  α =  β = 3, γ = 6, and ζ = 2 in (1.A.26).


 7,   Then, the
left-hand side is 2 47 23 + 6 73 33 = 420 and the right-hand side is 6 84 = 420. ♦

 −1α = −4,
Example 1.A.12 Assume −4β = −1, γ = 3, and ζ = 2 in (1.A.26). then, the
−1
left-hand side is 2 −4
1 2
+ 6 0 3  
(−4)! (−1)!
= 2 × (−5)!1! (−3)!2!
(−4)! (−1)!
+ 6 × (−4)!0! (−4)!3!
=
−7 (−7)!
−14 and the right-hand side is (−1)(−2) 1 = 2 × (−8)!1! = −14. ♦

Example 1.A.13 The identity (1.A.26) holds true also for non-integer values of α
or β. For example, when α = 21 , β = − 21 , γ = 2, and ζ = 1, the left-hand side is
 1 − 1   1 − 1 
= −( 21 )!1! (− 32 )!1! + 2 (12 )!0! (− 52 )!2! = 21 and the right-hand side
2 2 + 2 2 2
1
! −1 ! 1
! −1 !
1 1
 
0 2 ( 2 ) ( 2 ) ( 2 ) ( 2)

is − 21 × −1 1
= − 1
2
× Γ (0)
Γ (−1)
= − 1
2
× (−1)!
(−2)!1!
= 1
2
. ♦

Example 1.A.14 Denoting the unit imaginary number by j = −1, assume α = e
− j, β = π + 2 j, γ = 4, and ζ = 2 in (1.A.26). Then, the left-hand
        
side is 0 + 0 + 2 × α2 β2 + 6 × α1 β3 + 12 × α0 β4 = 21 α(α − 1)β(β − 1) + αβ(β − 1)(β −
2) + 21 β(β − 1)(β − 2)(β − 3) = 21 β(β − 1) {α(α − 1) + 2α(β − 2) + (β − 2)(β − 3)} = 21 β(β −
1) α2 + α(2β − 5) + (β − 2)(β − 3) = 21 β(β − 1)(α + β − 2)(α + β − 3)
and the right-hand
 
side is also β(β − 1) α+β−2
2
= β(β − 1) (α+β−2)!
(α+β−4)!2!
= 1
2
β(β − 1)(α + β − 2)(α +
β − 3). ♦
72 1 Preliminaries

Example 1.A.15 When ζ = 1 and ζ = 2, (1.A.26) can be specifically written as

γ     
α β α+β−1
m =β
m=0
γ−m m γ−1
 
βγ α + β
= (1.A.29)
α+β γ

and
γ
     
α β α+β−2
m(m − 1) = β(β − 1)
m=0
γ−m m γ−2
 
β(β − 1)γ(γ − 1) α + β
= , (1.A.30)
(α + β)(α + β − 1) γ

respectively. The two results (1.A.29) and (1.A.30) will later be useful for obtaining
the mean and variance of the hypergeometric distribution in Exercise 3.68. ♦

(D) Euler Reflection Formula

We now prove the Euler reflection formula


π
Γ (1 − x)Γ (x) = (1.A.31)
sin πx

for 0 < x < 1 mentioned in (1.4.79). First, if we let x = s


s+1
in the defining equation
(1.4.95) of the beta function, we get
 ∞
s α−1
B̃(α, β) = ds. (1.A.32)
0 (s + 1)α+β

Using (1.A.32) and Γ (α)Γ (β) = Γ (α + β) B̃(α, β) from (1.4.96) will lead us to
 ∞
s x−1
Γ (1 − x)Γ (x) = ds (1.A.33)
0 s+1

for α = x and β = 1 − x. To obtain the right-hand side of (1.A.33), we consider the


contour integral

z x−1
dz (1.A.34)
C z−1

in the complex space. The contour C of the integral in (1.A.34) is shown in Fig. 1.26,
a counterclockwise path along the outer circle. As there exists only one pole z = 1
Appendices 73

inside the contour C, we get



z x−1
dz = 2π j (1.A.35)
C z−1
 z x−1 x−1
from the residue theorem C z−1 dz = 2π j Res zz−1 . Consider the integral along C in
z=1
four segments. First, we have z = Re jθ and dz = j Re jθ dθ over the segment from
z 1 = Re j (−π+) to z 2 = Re j (π−) along the circle with radius R. Second, we have z =
r e j (π−) and dz = e j (π−) dr over the segment from z 2 = Re j (π−) to z 3 = pe j (π−)
along the straight line toward the origin. Third, we have z = pe jθ and dz = j pe jθ dθ
over the segment from z 3 = pe j (π−) to z 4 = pe j (−π+) clockwise along the circle
with radius p. Fourth, we have z = r e j (−π+) and dz = e j (−π+) dr over the segment
from z 4 = pe j (−π+) to z 1 = Re j (−π+) along the straight line out of the origin. Thus,
we have
  π−  jθ x−1  p x−1 j (π−)
z x−1 Re j Re jθ r e j (π−) e
dz = dθ + (π−)
dr
C z−1 Re − 1 −1
jθ re j
−π+ R
 −π+  jθ x−1  R x−1 j (−π+)
pe j pe jθ r e j (−π+) e
+ jθ − 1
dθ + j (−π+) − 1
dr,
π− pe p r e
(1.A.36)

which can be written as


  π−  p x−1 j (π−)x
z x−1 j R x e jθx r e
dz = dθ + j (π−) − 1
dr
C z − 1 −π+ Re jθ − 1
R re
 −π+  R x−1 j (−π+)x
j p x e jθx r e
+ jθ − 1
dθ + j (−π+) − 1
dr (1.A.37)
π− pe p re

after some steps. When x > 0 and p → 0, the third term in the right-hand side
 −π+ j p x e jθx
of (1.A.37) is lim π− pe jθ −1 dθ = 0. Similarly, the first term in the right-hand
p→0
 x jθx 
 jR e  √ Rx Rx
side of (1.A.37) is  Re jθ −1  = ≤ R−1 → 0 when x < 1 and R → ∞.
R 2 −2R cos θ+1
Therefore, (1.A.37) can be written as
  0 x−1 jπx  ∞ x−1 − jπx
z x−1 r e r e
dz = 0 + dr + 0 + dr
C z−1 ∞ −r − 1 0 −r − 1

  ∞ r x−1
= e jπx − e− jπx dr
0 r +1
 ∞ x−1
r
= 2 j sin πx dr (1.A.38)
0 r +1

for 0 < x < 1 when R → ∞, p → 0, and  → 0. In short, we have


74 1 Preliminaries

Fig. 1.26 The contour C of Im(z)


 x−1
integral C zz−1 dz, where
0< p<1<R

z2
z3
Re(z)
×
p 1 R
z1 z4

 ∞
r x−1 π
dr = (1.A.39)
0 r +1 sin πx

for 0 < x < 1 from (1.A.35) and (1.A.38) and, subsequently, we have (1.A.31) for
0 < x < 1 from (1.A.33) and (1.A.39).

Appendix 1.2 Some Results

(A) Stepping Stone

Consider crossing a creek via n − 1 stepping stones with steps 0 and n denoting the
two banks of the creek. Assume we can move only in one direction and skip either
0 or 1 step at each move.
(1) Obtain the number of ways we can complete the crossing in k moves.
(2) Obtain the number of ways we can complete the crossing.

Solution (1) Let a(n, k) be the number of ways we can complete the crossing in k
moves and let
%n &
n2 = , (1.A.40)
2
where #x$ denotes the smallest integer not smaller than x and is called the ceiling
function. Denote the number of moves in which we skip 0 and 1 step by k1 and
k2 , respectively. Then, from k1 + k2 = k and k1 + 2k2 = n, we get k1 = 2k − n
and k2 = n − k. The number a(n, k) of ways we can complete the crossing in
k moves is the same as the number k1k!!k2 ! of arranging k1 of 1’s and k2 of 2’s. In
short, we have a(n, k) = (2k−n)!(n−k)!
k!
, i.e.,

a(n, k) = k Cn−k (1.A.41)


Appendices 75

Table 1.6 Number a(n, k) = k Cn−k of ways we can cross a creek via n − 1 stepping stones in k
moves when we can move only in one direction and skip either 0 or 1 step at each move
n
1 2 3 4 5 6 7 8 9 10 11
k 1 1 1 0 0 0 0 0 0 0 0 0
2 0 1 2 1 0 0 0 0 0 0 0
3 0 0 1 3 3 1 0 0 0 0 0
4 0 0 0 1 4 6 4 1 0 0 0
5 0 0 0 0 1 5 10 10 5 1 0
6 0 0 0 0 0 1 6 15 20 15 6
7 0 0 0 0 0 0 1 7 21 35 35
8 0 0 0 0 0 0 0 1 8 28 56
9 0 0 0 0 0 0 0 0 1 9 36
10 0 0 0 0 0 0 0 0 0 1 10
11 0 0 0 0 0 0 0 0 0 0 1

for k = n 2 , n 2 + 1, . . . , n: some values of a(n, k) are shown in Table 1.6.


(2) Denoting the number of ways we can complete the crossing by a(n), which is the
same as the number of arranging k of 1’s and n − k of 2’s for k = 1, 2, . . . , n,
n
we get a(n) = a(n, k), i.e.,
k=1


n
a(n) = k Cn−k . (1.A.42)
k=n 2

We also have a(1) = 1 and a(2) = 2. Therefore, a(n) is the sum of the number
of ways from step n − 1 to n and that from n − 2 directly to n. We thus have
a(n) = a(n − 1) + a(n − 2) for n = 3, 4, . . . because the number of ways to
step n − 1 is a(n − 1) and that to step n − 2 is a(n − 2). Solving the recursion,
we get 28
⎧' ⎫
√ (n+1 ' √ (n+1 ⎬
1 ⎨ 1+ 5 1− 5
a(n) = √ − . (1.A.43)
5⎩ 2 2 ⎭

Here, as it is shown later in (1.A.58), the number a(n) is an integer when n ∈


{0, 1, . . .}. Table 1.7 shows a(n) for n = 1, 2, . . . , 20. Recollecting that k Cn−k =
0 for k = 0, 1, . . . , n 2 − 1, we have

28 One solving method is as follows: let a(n) = θn in a(n) = a(n − 1) + a(n − 2). Then, θ = 1±2 5
 √ n  √ n
from θ2 − θ − 1 = 0. Next, from a(n) = c1 1+2 5 + c2 1−2 5 and the initial conditions
√ √
a(1) = 1 and a(2) = 2, we get c1 = 1+√ 5
and c2 = − 1−√ 5 .
2 5 2 5
76 1 Preliminaries
 √ n+1  √ n+1 
Table 1.7 Number a(n) = √1 1+ 5
2 − 1− 5
2 of ways we can cross a creek via
5
n − 1 stepping stones when we can move only in one direction and skip either 0 or 1 step at each
move
n 1 2 3 4 5 6 7 8 9 10
a(n) 1 2 3 5 8 13 21 34 55 89
n 11 12 13 14 15 16 17 18 19 20
a(n) 144 233 377 610 987 1597 2584 4181 6765 10946

⎧' ⎫
√ (n+1 ' √ (n+1 ⎬
 1 ⎨ 1+ 5
n
1− 5
k Cn−k = √ − (1.A.44)
k=0 5⎩ 2 2 ⎭

from (1.A.42) and (1.A.43). The result (1.A.44) is the same as the well-
% n2 &
 ∞
known identity (Roberts and Tesman 2009) Fn+1 = n−k Ck , where {Fn }n=0
k=0
are Fibonacci numbers. Here, %x&, also expressed as [x], denotes the greatest
integer not larger than x and is called the floor function or Gauss function.
Clearly, %x& = #x$ = x when x is an integer while %x& = #x$ − 1 when x is not
an integer.

If the condition ‘skip either 0 or 1 step at each move’ is replaced with ‘skip either
0 step or 2 steps at each move’, then we have


n

m C n−m
2
= a13 r13
n
+ 2Re c13 z 13
n
(1.A.45)
m=n 3 ,n 3 +2,...

, -
for n = 0, 1, . . ., where n 3 = n − 2 n3 . In addition, the three numbers r13 =
   ∗
3 (1
1
+ 213 ), z 13 = 13 1 − 13 − j 3 213 − 1 , and z 13 with the unit imagi-

nary number j = −1 are the solutions to the difference equation a(n) =
a(n − 1) + a(n − 3) for n ≥ 4 with initial conditions a(1) = 1, a(2) = 1,
and a(3) = 2. The three numbers are  also the solutions to θ3 − θ2 − 1 =
 √ 1  √ 1
≈ 1.6984, a13 = r (r13 −z( )13 r −z
3 3 (z −1) z ∗ −1)+1
0, and 13 = 21 29−3 93
+ 29+32 93 =
13 ( 13 13 )
2 ∗
13 13

, and c13 = z (z13 −r( )13 z −z


|z 13 −1|2 +1 (r −1) z −1)+1

.
13 ( 13 13 )

r13 |r13 −z 13 |2 13 13
In addition, if the condition ‘skip either 0 or 1 step at each move’ is replaced with
‘skip either 1 step or 2 steps at each move’, we can obtain


n

m Cn−2m = a23 r23


n
+ 2Re c23 z 23
n
(1.A.46)
m=0
Appendices 77

for n = 2, 3, . . . by noting that m Cn−2m = (3m−n)!(n−2m)!


m!
= 0 when m =
.n/ ,n- ,n-
0, 1, . . . , 3 − 1 or m = 2 + 1, 2 + 2, . . . , n. Here, the three numbers r23 =

223 , z 23 = −23 − j 3223 − 1, and z 23 are the solutions to the difference equation
a(n) = a(n − 2) + a(n − 3) for n ≥ 5 with initial conditions a(2) = 1, a(3) = 1,
and a(4) = 1. The three numbers are also the solutions to θ3 − θ − 1 = 0, and
 √  13  √  13
23 = 9−14469 + 9+14469 ≈ 0.6624, a23 = r 2 (r 23−z () 23r −z)∗ = r 2 |z|r23 −1|
(z −1) z ∗ −1 2
2 , and
23 23 23 ( 23 23 ) 23 23 −z 23 |
(r23 −1)(z 23

−1)
c23 = z 2 (z −r ) z −z ∗ .
23 23 23 ( 23 23 )
The sequences of numbers calculated from (1.A.43), (1.A.45), and (1.A.46) can
be written as
0 n 1∞

m Cn−m = {1, 1, 2, 3, 5, 8, . . .} , (1.A.47)
m=0 n=0

0 1∞

n

m C n−m
2
= {1, 1, 1, 2, 3, 4, 6, 9, . . .} , (1.A.48)
m=n 3 ,n 3 +2,... n=0

and
0 1∞

n

m Cn−2m = {1, 1, 1, 2, 2, 3, 4, 5, 7, 9, 12, . . .} , (1.A.49)


m=0 n=2

respectively.

(B) Order of Operations

When some operations such as limit, integration, and differentiation are evaluated,
a change of order does not usually make a difference. However, the order is of
importance in certain cases. Here, we present some examples in which a change of
order yields different results.

Example 1.A.16 (Gelbaum and Olmsted 1964) For the function


⎧ −2
⎨ y , 0 < x < y < 1,
f (x, y) = −x −2 , 0 < y < x < 1, (1.A.50)

0, otherwise,
11 1 1 y
we have 0 0 f (x, y)d xd y = 0 dy = 1 because 0 f (x, y)d x = 0 dyx2 −
 1 dx   11 1
y x 2 = y + 1 − y = 1. On the other hand, 0 0 f (x, y)d yd x = 0 (−1)d x =
1 1
78 1 Preliminaries
1 x 1  
−1 because 0 f (x, y)dy = − 0 dy x2
+ x dyy2
= − x1 − 1 − x1 = −1. In other
11 11
words, 0 0 f (x, y)d xd y = 0 0 f (x, y)d yd x. ♦
Example 1.A.17 (Gelbaum and Olmsted 1964) Consider a sequence { f n (x)}∞
1 n=1
with f n (x) = n 2 xe−nx on the support [0, 1]. Then, lim 0 f n (x)d x =
 1 n→∞
lim −(nx + 1)e−nx 0 = − lim (n + 1)e−n − 1 , i.e.,
n→∞ n→∞

 1
lim f n (x)d x = 1. (1.A.51)
n→∞ 0

On the other hand, lim f n (x) is 0 for 0 ≤ x ≤ 1 because f n (x) = 0 for x = 0 and
n→∞
1 1
lim f n (x) = x1 lim (nx)
2

e nx = 0 for 0 < x ≤ 1. Thus, 0 lim f n (x)d x = 0 0d x = 0


n→∞
n→∞
1 1 n→∞
and, therefore, 0 lim f n (x)d x = lim 0 f n (x)d x. ♦
n→∞ n→∞

Example 1.A.18 (Gelbaum and Olmsted 1964) For the function


0
x 2 −y 2
, x 2 + y 2 = 0,
f (x, y) = x 2 +y 2 (1.A.52)
0, x = y = 0,

x2
we have lim lim f (x, y) = lim lim f (x, y) because lim lim f (x, y) = lim 2 =1
x→0 y→0 y→0 x→0 x→0 y→0 x→0 x
lim −y
2
and lim lim f (x, y) = y 2 = −1. ♦
y→0 x→0 y→0

Example 1.A.19 (Gelbaum and Olmsted 1964) Consider a sequence{ f n (x)}∞ n=1 of
functions in which f n (x) = 1+nx2 x 2 for |x| ≤ 1. Then, ddx lim f n (x) = 0, but
n→∞

 
d 1 − n2 x 2
lim f n (x) = lim  2
n→∞ dx n→∞
1 + n2 x 2

1, x = 0
= (1.A.53)
0, 0 < |x| ≤ 1,
 
implying that d
dx
lim f n (x) = lim d
dx
f n (x) . ♦
n→∞ n→∞

Example 1.A.20 (Gelbaum and Olmsted 1964) Consider the function


0  2
x3
exp − xy , y > 0,
f (x, y) = y2 (1.A.54)
0, y = 0.
1  
For any value of x, we have 0 f (x, y)dy = x exp −x 2 . If we let g(x) =
  1    
x exp −x 2 , then we have ddx 0 f (x, y)dy = g  (x) = 1 − 2x 2 exp −x 2 for any
Appendices 79


value of x. On the other hand, ∂x
f (x, y) = 0 for any value of y when x = 0 because
0   2
∂ 3x 2
− 2x 4
exp − xy , y > 0,
f (x, y) = y2 y3 (1.A.55)
∂x 0, y = 0.

In other words,
 01    2
3x 2 2x 4
1
∂ − exp − xy x = 0,
f (x, y) dy = 1
0 y2 y3
0 ∂x 0 dy, x =0
0
  2
(1 − 2x ) exp −x , x = 0,
2
= (1.A.56)
0, x = 0,
1 1∂
and thus d
dx 0 f (x, y)dy = 0 ∂x f (x, y)dy. ♦

Example 1.A.21 (Gelbaum and Olmsted 1964) Consider the Cantor function
φC (x) discussed in Example 1.3.11 and f (x) = 1 for x ∈ [0, 1]. Then, both
1
the Riemann-Stieltjes and Lebesgue-Stieltjes integrals produce 0 f (x)dφC (x) =
1
[ f (x)φC (x)]10 − 0 φC (x)d f (x) = φC (1) − φC (0) − 0, i.e.,
 1
f (x) dφC (x) = 1 (1.A.57)
0

1 1
while the Lebesgue integral results in 0 f (x)φC (x)d x = 0 0d x = 0. ♦

(C) Sum of Powers of Two Real Numbers

Theorem 1.A.3 If the sum α + β and product αβ of two numbers α and β are both
integers, then

αn + β n = integer (1.A.58)

for n ∈ {0, 1, . . .}.

Proof Let us prove the theorem via mathematical induction. It is clear that (1.A.58)
holds true when n = 0 and 1. When n = 2, α2 + β 2 = (α + β)2 − 2αβ is an integer.
Assume αn + β n are all integers for n = 1, 2, . . . , k − 1. Then,
   
αk + β k = (α + β)k − k C1 αβ αk−2 + β k−2 − k C2 (αβ)2 αk−4 + β k−4
0 k−1
k C k−1 (αβ) 2 (α + β) , k is odd,
−··· − 2
k (1.A.59)
k C 2k (αβ) 2 , k is even,
80 1 Preliminaries

which implies that αn + β n is an integer when n = k because the binomial coefficient


k C j is always an integer for j = 0, 1, . . . , k when k is a natural number. In other
words, if αβ and α + β are both integers, then αn + β n is also an integer when
n ∈ {0, 1, . . .}. ♠

(D) Differences of Geometric Sequences

Theorem 1.A.4 Consider the difference

Dn = αa n − βbn (1.A.60)

of two geometric sequences, where α > 0, a > b > 0, a = 1, and b = 1. Let


(1−b)β
ln (1−a)α
r = . (1.A.61)
ln ab

Then, the sequence {Dn }∞


n=1 has the following properties:

(1) For 0 < a < 1, Dn is the largest at n = r and n = r + 1 if r is an integer and


at n = #r $ if r is not an integer.
(2) For a > 1, Dn is the smallest at n = r and n = r + 1 if r is an integer and at
n = #r $ if r is not an integer.

Proof Consider the case 0 < a < 1. Then, from Dn+1 − Dn = bn (1 − b) β −


a n (1 − a) α, we have Dn+1 > Dn for n < r , Dn+1 = Dn for n = r , and Dn+1 < Dn
for n > r and, subsequently, (1). We can similarly show (2). ♠

Example 1.A.22 The sequence {αa n − βbn }∞


n=1
(1−b)β
(1) is increasing and decreasing if a > 1 and 0 < a < 1, respectively, when (1−a)α <
a
b
,
(2) is first decreasing and then increasing and first increasing and then decreasing if
(1−b)β
a > 1 and 0 < a < 1, respectively, when (1−a)α > ab , and
(1−b)β
(3) is increasing and decreasing if a > 1 and 0 < a < 1, respectively, when (1−a)α =
a
b
or, equivalently, when αa − βb = αa − βb .
2 2

Example 1.A.23 Assume α = β. Then,


(1) {a n − bn }∞
n=1 is an increasing and decreasing sequence when a > 1 and a + b <
1, respectively,
(2) {a n − bn }∞
n=1 is a decreasing sequence and a − b = a − b when a + b = 1,
2 2

and
Appendices 81

(3) {a n − bn }∞
n=1 is a sequence that first increases and then decreases with the max-
imum at n = #r $ if r is not an integer and at n = r and n = r + 1 if r is an
integer when a + b > 1 and 0 < a < 1.

Example 1.A.24 Assume α = β = 1, a = 0.95, and b = 0.4. Then, we have
0.95 −%4 < 0.95 2
− 42 < 0.953 − 43 and 0.953 − 43 > 0.954 − 44 > · · · because
0.6 &
ln 0.05
#r $ = ln 0.95 ≈ #2.87$ = 3. ♦
0.4

(E) Selections of Numbers with No Number Unchosen

The number of ways to select r different elements from a set of n distinct elements is
n Cr . Because every element will be selected as many times as any other element, each
of the n elements will be selected n Cr × nr = n−1 Cr −1 times over the n Cr selections.
Each of the n elements will be included at least once if we choose appropriately
%n &
m1 = (1.A.62)
r
selections among the n Cr selections.
. / For example, assume the set {1, 2, 3, 4, 5} and
r = 2. Then, we have m 1 = 25 = 3, and thus, each of the five elements is included
at least once in the three selections (1, 2), (3, 4), and (4, 5).
Next, it is possible that one or more elements will not be included if we consider
n−1 Cr selections or less among the total n Cr selections. For example, for the set
{1, 2, 3, 4, 5} and r = 2, in some choices of 4 C2 = 6 selections or less such as (1, 2),
(1, 3), (1, 4), (2, 3), (2, 4), and (3, 4) among the total of 5 C2 = 10 selections, the
element 5 is not included.
On the other hand, each of the n elements will be included at least once in any

m2 = 1 + n−1 Cr (1.A.63)

selections. Here, we have


 r
n−1 Cr = 1− n Cr
n
= n Cr − n−1 Cr −1 . (1.A.64)

The identity (1.A.64) implies that the number of ways for a specific element not to
be included when selecting r elements from a set of n distinct elements is the same
as the following two numbers:
(1) The number of ways to select r elements from a set of n − 1 distinct elements.
(2) The difference between the number of ways to select r elements from a set of n
distinct elements and that for a specific element to be included when selecting r
elements from a set of n distinct elements.
82 1 Preliminaries

(F) Fubini’s theorem

Theorem 1.A.5 When the function f (x, y) is continuous on A = {(x, y) :


a ≤ x ≤ b, c ≤ y ≤ d}, we have
   d  b
f (x, y)d xd y = f (x, y)d xd y
c a
A
 b  d
= f (x, y)d yd x. (1.A.65)
a c

In addition, we have
   b  g2 (x)
f (x, y)d xd y = f (x, y)d yd x (1.A.66)
a g1 (x)
A

if f (x, y) is continuous on A = {(x, y) : a ≤ x ≤ b, g1 (x) ≤ y ≤ g2 (x)} and both


g1 and g2 are continuous on [a, b].

(G) Partitions of Numbers

A representation of a natural number as the sum of natural numbers is also called


a partition. Denote the number of partitions for a natural number n as the sum of
k natural numbers by M(n, k). Then, the number N (n) of partitions for a natural
number n can be expressed as


n
N (n) = M(n, k). (1.A.67)
k=1

As we can see, for example, from

1: {1},
2: {2}, {1,1},
(1.A.68)
3: {3}, {2,1}, {1,1,1},
4: {4}, {3,1}, {2,2}, {2,1,1}, {1,1,1,1},

we have N (1) = M(1, 1) = 1, N (2) = M(2, 1) + M(2, 2) = 2, and N (3) =


M(3, 1) + M(3, 2) + M(3, 3) = 3. In addition, M(4, 1) = 1, M(4, 2) = 2,
M(4, 3) = 1, and M(4, 4) = 1. In general, the number M(n, k) satisfies

M(n, k) = M(n − 1, k − 1) + M(n − k, k). (1.A.69)


Appendices 83

Example 1.A.25 We have M(5, 3) = M(5 − 1, 3 − 1) + M(5 − 3, 3) = M(4, 2) +


M(2, 3) = 2 + 0 = 2 from (1.A.69). ♦

Theorem 1.A.6 Denote the least common multiplier of k consecutive natural num-
bers 1, 2, . . . , k by k̃. Let the quotient and remainder of n when divided by k be Q k
and Rk , respectively. If we write

n = k̃ Q k̃ + Rk̃ , (1.A.70)

then the number M(n, k) can be expressed as


k−1
 
M(n, k) = ci,k Rk̃ Q ik̃ , Rk̃ = 0, 1, . . . , k̃ − 1 (1.A.71)
i=0

k−1
in terms of k̃ polynomials of order k − 1 in Q k̃ , where ci,k (·) i=0
are the coefficients
of the polynomial.

Based on Theorem 1.A.6, we can obtain M(n, 1) = 1,


 n−1
, n is odd,
M(n, 2) = 2 (1.A.72)
n
, n is even,
 2
n ,
2
R6 = 0; n 2 − 1, R6 = 1, 5;
12 M(n, 3) = (1.A.73)
n 2 − 4, R6 = 2, 4; n 2 + 3, R6 = 3,

and
⎧ 3

⎪ n + 3n 2 , R12 = 0,
⎪ 3


⎪ n + 3n 2 − 20, R12 = 2,
⎪ 3


⎪ n + 3n 2 + 32, R12 = 4,


⎨ n 3 + 3n 2 − 36, R12 = 6,
144 M(n, 4) = n 3 + 3n 2 + 16, R12 = 8, (1.A.74)



⎪ n 3 + 3n 2 − 4, R12 = 10,



⎪ n 3 + 3n 2 − 9n + 5, R12 = 1, 7,



⎪ n 3 + 3n 2 − 9n − 27, R12 = 3, 9,
⎩ 3
n + 3n 2 − 9n − 11, R12 = 5, 11,

for example. Table 1.8 shows the 60 polynomials of order four in Q 60 for the repre-
sentation of M(n, 5).
84 1 Preliminaries

0
Table 1.8 Coefficients c j,5 (r ) j=4 in M(n, 5) = c4,5 (R60 )Q 460 + c3,5 (R60 )Q 360 +
c2,5 (R60 )Q 260 + c1,5 (R60 )Q 60 + c0,5 (R60 )
r c4,5 (r ), c3,5 (r ), c2,5 (r ), c1,5 (r ), c0,5 (r ) r c4,5 (r ), c3,5 (r ), c2,5 (r ), c1,5 (r ), c0,5 (r )
0 4500, 750, 25/2, −5/2, 0 1 4500, 1050, 115/2, 1/2, 0
2 4500, 1350, 235/2, 3/2, 0 3 4500, 1650, 385/2, 17/2, 0
4 4500, 1950, 565/2, 29/2, 0 5 4500, 2250, 775/2, 55/2, 1
6 4500, 2550, 1015/2, 81/2, 1 7 4500, 2850, 1285/2, 123/2, 2
8 4500, 3150, 1585/2, 167/2, 3 9 4500, 3450, 1915/2, 229/2, 5
10 4500, 3750, 2275/2, 295/2, 7 11 4500, 4050, 2665/2, 381/2, 10
12 4500, 4350, 3085/2, 473/2, 13 13 4500, 4650, 3535/2, 587/2, 18
14 4500, 4950, 4015/2, 709/2, 23 15 4500, 5250, 4525/2, 855/2, 30
16 4500, 5550, 5065/2, 1011/2, 37 17 4500, 5850, 5635/2, 1193/2, 47
18 4500, 6150, 6235/2, 1387/2, 57 19 4500, 6450, 6865/2, 1609/2, 70
20 4500, 6750, 7525/2, 1845/2, 84 21 4500, 7050, 8215/2, 2111/2, 101
22 4500, 7350, 8935/2, 2393/2, 119 23 4500, 7650, 9685/2, 2707/2, 141
24 4500, 7950, 10465/2, 3039/2, 164 25 4500, 8250, 11275/2, 3405/2, 192
26 4500, 8550, 12115/2, 3791/2, 221 27 4500, 8850, 12985/2, 4213/2, 255
28 4500, 9150, 13885/2, 4657/2, 291 29 4500, 9450, 14815/2, 5139/2, 333
30 4500, 9750, 15775/2, 5645/2, 377 31 4500, 10050, 16765/2, 6191/2, 427
32 4500, 10350, 17785/2, 6763/2, 480 33 4500, 10650, 18835/2, 7377/2, 540
34 4500, 10950, 19915/2, 8019/2, 603 35 4500, 11250, 21025/2, 8705/2, 674
36 4500, 11550, 22165/2, 9421/2, 748 37 4500, 11850, 23335/2, 10183/2, 831
38 4500, 12150, 24535/2, 10977/2, 918 39 4500, 12450, 25765/2, 11819/2, 1014
40 4500, 12750, 27025/2, 12695/2, 1115 41 4500, 13050, 28315/2, 13621/2, 1226
42 4500, 13350, 29635/2, 14583/2, 1342 43 4500, 13650, 30985/2, 15597/2, 1469
44 4500, 13950, 32365/2, 16649/2, 1602 45 4500, 14250, 33775/2, 17755/2, 1747
46 4500, 14550, 35215/2, 18901/2, 1898 47 4500, 14850, 36685/2, 20103/2, 2062
48 4500, 15150, 38185/2, 21347/2, 2233 49 4500, 15450, 39715/2, 22649/2, 2418
50 4500, 15750, 41275/2, 23995/2, 2611 51 4500, 16050, 42865/2, 25401/2, 2818
52 4500, 16350, 44485/2, 26853/2, 3034 53 4500, 16650, 46135/2, 28367/2, 3266
54 4500, 16950, 47815/2, 29929/2, 3507 55 4500, 17250, 49525/2, 31555/2, 3765
56 4500, 17550, 51265/2, 33231/2, 4033 57 4500, 17850, 53035/2, 34973/2, 4319
58 4500, 18150, 54835/2, 36767/2, 4616 59 4500, 18450, 56665/2, 38629/2, 4932

Exercises

Exercise 1.1 Show that B c ⊆ Ac when A ⊆ B.


 c
∞ ∞

Exercise 1.2 Show that ∩ Ai = ∪ Aic for a sequence {Ai }i=1 of sets.
i=1 i=1

Exercise 1.3 Express the difference A − B in terms only of intersection and sym-
metric difference, and the union A ∪ B in terms only of intersection and symmetric
difference.
Exercises 85

Exercise 1.4 Consider a sequence {Ai }i=1


n
of finite sets. Show
     
|A1 ∪ A2 ∪ · · · ∪ An | = |Ai | −  Ai ∩ A j  +  Ai ∩ A j ∩ A k 
i i< j i< j<k

− · · · + (−1) n−1
|A1 ∩ A2 ∩ · · · ∩ An | (1.E.1)

and
     
|A1 ∩ A2 ∩ · · · ∩ An | = |Ai | −  Ai ∪ A j  +  Ai ∪ A j ∪ A k 
i i< j i< j<k

− · · · + (−1) n−1
|A1 ∪ A2 ∪ · · · ∪ An | , (1.E.2)

where |Ai | denotes the number of elements in Ai .

Exercise 1.5 For a sequence {Ai }i=1


n
of finite sets, show
     
|A1 A2  · · · An | = |Ai | − 2  Ai ∩ A j  + 4  Ai ∩ A j ∩ A k 
i i< j i< j<k

− · · · + (−2) n−1
|A1 ∩ A2 ∩ · · · ∩ An | . (1.E.3)

(Hint. As observed in Example 1.1.35, any element in the set A1 ΔA2 Δ · · · ΔAn
belongs to only odd number of sets among {Ai }i=1
n
.)

Exercise 1.6 Is the set of polynomials with integer coefficients countable?

Exercise 1.7 Is the set of algebraic numbers29 countable?

Exercise 1.8 Show that the sets below are countable.


(1) The set of functions that map a finite subset of a countable set A onto a countable
set B.
(2) The set of convergent sequences of natural numbers.

Exercise 1.9 Is the collection of all non-overlapping open intervals with real end
points countable or uncountable?

Exercise 1.10 Find an injection from the set A to the set B in each of the pairs A
and B below. Here, J0 = J+ ∪ {0}, i.e.,

J0 = {0, 1, . . .}. (1.E.4)

(1) A = J0 × J0 . B = J0 .

29 When n = 1, 2, . . . and {a }n
i i=0 are all integers with an  = 0, a number z satisfying an z +
n

an−1 z n−1 + · · · + a0 = 0 is called an algebraic number. A number which is not an algebraic number
is called a transcendental number.
86 1 Preliminaries

(2) A = (−∞, ∞). B = (0, 1).


(3) A = R. B = the set of infinite sequences of 0 and 1.
(4) A = the set of infinite sequences of 0 and 1. B = [0, 1].
(5) A = the set of infinite sequences of natural numbers. B = the set of infinite
sequences of 0 and 1.
(6) A = the set of infinite sequences of real numbers. B = the set of infinite
sequences of 0 and 1.

Exercise 1.11 Find a function from the set A to the set B in each of the pairs A and
B below.
(1) A = J0 . B = J0 × J0 .
(2) A = J0 . B = Q.
(3) A = the Cantor set. B = [0, 1].
(4) A = the set of infinite sequences of 0 and 1. B = [0, 1].

Exercise 1.12 Find a one-to-one correspondence between the set A and the set B
in each of the pairs A and B below.
(1) A = (a, b). B = (c, d). Here, −∞ ≤ a < b ≤ ∞ and −∞ ≤ c < d ≤ ∞.
(2) A = the set of infinite sequences of 0, 1, and 2. B = the set of infinite sequences
of 0 and 1.
(3) A = [0, 1). B = [0, 1) × [0, 1).

Exercise 1.13 Assume f 1 : A → B and f 2 : C → D are both one-to-one corre-


spondences, A ∩ C = ∅, and B ∩ D = ∅. Show that f = f 1 ∪ f 2 or, equivalently,
f : A ∪ C → B ∪ D is a one-to-one correspondence.

Exercise 1.14 Find a one-to-one correspondence30 between A = J0 and B = J0 ×


J0 .

Exercise 1.15 Is the collection of intervals with rational end points in the space R
of real numbers countable?

Exercise 1.16 Show that U − C ∼ U when U is an uncountable set and C is a


countable set.

Exercise 1.17 Is the sum of two rational numbers a rational number? If we add one
more rational number, is the result a rational number? If we add an infinite number of
rational numbers, is the result a rational number? (Hint. The set of rational numbers
is closed under a finite number of additions, but is not closed under an infinite number
of additions.)

Exercise 1.18 Here, Q denotes the set of rational numbers defined in (1.1.29) and
0 < a < b < 1.

30 One of such one-to-one correspondences is a function called the Gödel pairing function.
Exercises 87

(1) Find a rational number between a and b when a ∈ Q and b ∈ Q.


(2) Find an irrational number between a and b when a ∈ Q and b ∈ Qc .
(3) Find an irrational number between a and b when a ∈ Q and b ∈ Q.
(4) Find an irrational number between a and b when a ∈ Qc and b ∈ Qc .
(5) Find a rational number between a and b when a ∈ Qc and b ∈ Qc .
(6) Find a rational number between a and b when a ∈ Q and b ∈ Qc .

Exercise 1.19 Consider a game between two players. After a countable subset A of
the interval [0, 1] is determined, the two players alternately choose one number from
{0, 1, . . . , 9}. Let the numbers chosen by the first and second players be x0 , x1 , . . .
and y0 , y1 , . . ., respectively. When the number 0.x1 y1 x2 y2 · · · belongs to A, the first
player wins and otherwise the second player wins. Find a way for the second player
to win.

Exercise 1.20 For a set function f : Ω → R, show


   c
f −1 Ac = f −1 (A) , (1.E.5)
−1 −1 −1
f (A ∪ B) = f (A) ∪ f (B), (1.E.6)

and

f −1 (A ∩ B) = f −1 (A) ∩ f −1 (B), (1.E.7)

where A, B ⊆ R.

Exercise 1.21 Show that the function f (x) = x 2 is uniformly continuous on S =


{x : 0 < x < 4}.

Exercise 1.22 Show that the function f (x) = x1 is continuous but not uniformly
continuous on S = (0, ∞).

Exercise 1.23 Show that the function f (x) = x is uniformly continuous on S =
(0, ∞).

Exercise 1.24 Confirm


 ∞
sin mx
lim f (x) d x = f (0) (1.E.8)
m→∞ −∞ πx

shown in (1.4.24). (Hint. Consider the inverse Fourier transform of product.)

Exercise 1.25 Recollect the definition of the unit step function.


(1) Expressu(ax + b), u(sin x), and u (e x − π) in other formulas.
x
(2) Obtain −∞ u(t − y)dt.

Exercise 1.26 Obtain the Fourier transform F {u(x)} of the unit step function u(x)
by following the order shown below.
88 1 Preliminaries

(1) Let the Fourier transform of the function



e−αx , x > 0,
sα (x) = (1.E.9)
−eαx , x < 0

with α > 0 be Sα (ω) = F {sα (x)}. Show

2
lim Sα (ω) = . (1.E.10)
α→0 jω

(2) Show that the Fourier transform of the impulse function δ(x) is F {δ(x)} = 1.
Then, show

F {1} = 2π δ(ω) (1.E.11)

using a property of Fourier transform.


(3) Noting that sgn(x) = 2u(x) − 1 and therefore u(x) = 21 {1 + sgn(x)}, we have
 
u(x) = 2 1 + lim sα (x) because sgn(x) = lim sα (x) from (1.E.9). Based on
1
α→0 α→0
this result, and noting (1.E.10) and (1.E.11), obtain the Fourier transform

1
F {u(x)} = πδ(ω) + (1.E.12)

of the unit step function u(x).

Exercise 1.27 Express δ  (x) cos x in another formula.


 2π  
Exercise 1.28 Calculate −2π eπx δ x 2 − π 2 d x. When 0 ≤ x < 2π, express
δ(sin x) in another formula.
∞   ∞ 
Exercise 1.29 Calculate −∞ δ  x 2 − 3x + 2 d x and −∞ (cos x + sin x)δ  x 3 +

x 2 + x d x.

Exercise 1.30 The multi-dimensional impulse function can be defined as


n
δ (x1 , x2 , . . . , xn ) = δ (xi ) . (1.E.13)
i=1

Show
δ (r )
δ (x, y) = (1.E.14)
πr

for r = x 2 + y 2 and
Exercises 89

δ (r )
δ (x, y, z) = (1.E.15)
2πr 2

for r = x 2 + y2 + z2.
Exercise 1.31 Obtain the limits of the sequences below.
  ∞
(1) 1 + n1 , 2 n=1
  ∞
(2) 1 + n1 , 2 n=1
  ∞
(3) 1, 1 + n1 n=1
  ∞
(4) 1, 1 + n1 n=1
  ∞
(5) 1 − n1 , 2 n=1
  ∞
(6) 1 − n1 , 2 n=1
  ∞
(7) 1, 2 − n1 n=1
  ∞
(8) 1, 2 − n1 n=1
Exercise 1.32 Consider the sequence { f n (x)}∞
n=1 of functions with
⎧ 2
⎨ 2n x, 0 ≤ x ≤ 2n
1
,
f n (x) = 2n − 2n x, 2n ≤ x ≤ n1 ,
2 1
(1.E.16)

0, 1
n
≤ x ≤ 1.
1 1
By obtaining 0 lim f n (x)d x and lim 0 f n (x)d x, confirm that the order of inte-
n→∞ n→∞
gration and limit are not always interchangeable.
Exercise 1.33 For the function
⎧ 3
⎨ 2n x, 0 ≤ x ≤ 2n
1
,
f n (x) = 2n − 2n x, 2n ≤ x ≤ n1 ,
2 3 1
(1.E.17)

0, 1
n
≤ x ≤ 1,
b b
and a number b ∈ (0, 1], obtain 0 lim f n (x)d x and lim 0 f n (x)d x, which shows
n→∞ n→∞
that the order of integration and limit are not always interchangeable.
Exercise 1.34 Obtain the number of all possible arrangements with ten distinct red
balls and ten distinct black balls.
Exercise 1.35 Show31 the identities

a−1 Cb−1 + a−1 Cb = a Cb , (1.E.18)


 n

p Ck q Cn−k = p+q Cn , (1.E.19)


k=0

31 Here, (1.E.18) is called the Pascal’s identity or Pascal’s rule.


90 1 Preliminaries

and


n

k C j n−k Cm− j = n+1 Cm+1 , (1.E.20)


k=0

where a and b are complex numbers excluding {a = 0, b=integer}, p and q are


complex numbers, n ∈ {0, 1, 2, . . .}, and j and m are integers such that 0 ≤ j ≤
m ≤ n. (Hint. For (1.E.19), recollect (1 + x)a+b = (1 + x)a (1 + x)b .)

Exercise 1.36 Show


n   
  
n n 2n
= (1.E.21)
k=0
k k n

and
n  
 n  
  
k k n+1
= = . (1.E.22)
k=0
m k=m
m m+1

For two integers n and q such that n ≥ q, show


n     
n k n
= 2n−q . (1.E.23)
k=q
k q q

Exercise 1.37 Show


k0 
k1 
kn−1

k0 
k0 
k0  
k0 + n − 1
··· 1 = ··· 1 = . (1.E.24)
k1 =1 k2 =1 kn =1 k1 =1 k2 =k1 kn =kn−1
n

(Hint. Consider the number of combinations of choosing n from k0 with repetitions


allowed.)

Exercise 1.38 The identity

Γ (β)Γ (α)
B̃(α, β) = (1.E.25)
Γ (α + β)

is shown in Appendix 1.1. Now, based on


 ∞  ∞
Γ (α)Γ (β) = t α−1 e−t dt t β−1 e−t dt
 ∞ ∞
0 0

= t α−1 s β−1 e−(t+s) dsdt, (1.E.26)


0 0

confirm (1.E.25).
Exercises 91

Exercise 1.39 Referring to Table 1.4, show



⎨ (−1)z , z = 0, 1, . . . ,
−1 Cz = (−1)z+1 , z = −1, −2, . . . , (1.E.27)

±∞, otherwise

and

⎨ (−1)z (z + 1), z = −1, 0, . . . ,
−z+1
C
−2 z = (−1) (z + 1), z = −2, −3, . . . , (1.E.28)

±∞, otherwise,

which imply −1 Ca = −1 Cb for a + b = −1 and −2 Ca = −2 Cb for a + b = −2.

Exercise 1.40 Show


(−1)k−1 (2k)!
(1) 1
2
Ck = 2k−1 22k (k!)2
for k = 0, 1, 2, . . ..
(2) − 21 Ck = −(2k − 1) 21 Ck = (−1)k 22k(2k)!
(k!)2
for k = 0, 1, 2, . . ..
(3) −2k−1 Cm = (−1)m (2k+m)!
(2k)!m!
for k = 0, 1, 2, . . . and m = 0, 1, 2, . . ..

Exercise 1.41 Based on (1.A.15), obtain the values of p C0 − p C1 + p C2 − p C3 +


· · · and p C0 + p C1 + p C2 + p C3 + · · · when p > 0. Using the results, obtain
∞ 

p C2k+1 and p C2k when p > 0.
k=0 k=0

1
Exercise 1.42 Obtain the series expansions of g1 (z) = (1 + z) 2 and g2 (z) = (1 +
z)− 2 .
1

Exercise 1.43 For non-negative numbers α and β such that α + β = 0, show that

αβ 2αβ  α+β
≤ min(α, β) ≤ ≤ αβ ≤ ≤ max(α, β). (1.E.29)
α+β α+β 2

References

M. Abramowitz, I.A. Stegun (eds.), Handbook of Mathematical Functions (Dover, New York, 1972)
G.E. Andrews, R. Askey, R. Roy, Special Functions (Cambridge University, Cambridge, 1999)
E. Artin, The Gamma Function (Translated by M Rinehart, and Winston Butler) (Holt, New York,
1964)
B.C. Carlson, Special Functions of Applied Mathematics (Academic, New York, 1977)
J.L. Challifour, Generalized Functions and Fourier Analysis: An Introduction (W. A. Benjamin,
Reading, 1972)
C.A. Charalambides, Enumerative Combinatorics (Chapman and Hall, New York, 2002)
W.A. Gardner, Introduction to Random Processes with Applications to Signals and Systems, 2nd
edn. (McGraw-Hill, New York, 1990)
B.R. Gelbaum, J.M.H. Olmsted, Counterexamples in Analysis (Holden-Day, San Francisco, 1964)
92 1 Preliminaries

I.M. Gelfand, I. Moiseevich, Generalized Functions (Academic, New York, 1964)


H.W. Gould, Combinatorial Identities (Morgantown Printing, Morgantown, 1972)
R.P. Grimaldi, Discrete and Combinatorial Mathematics, 3rd edn. (Addison-Wesley, Reading, 1994)
P.R. Halmos, Measure Theory (Van Nostrand Reinhold, New York, 1950)
R.F. Hoskins, J.S. Pinto, Theories of Generalised Functions (Horwood, Chichester, 2005)
K. Ito (ed.), Encyclopedic Dictionary of Mathematics (Massachusetts Institute of Technology, Cam-
bridge, 1987)
R. Johnsonbaugh, W.E. Pfaffenberger, Foundations of Mathematical Analysis (Marcel Dekker, New
York, 1981)
D.S. Jones, The Theory of Generalised Functions, 2nd edn. (Cambridge University, Cambridge,
1982)
R.P. Kanwal, Generalized Functions: Theory and Applications (Birkhauser, Boston, 2004)
K. Karatowski, A. Mostowski, Set Theory (North-Holland, Amsterdam, 1976)
S.M. Khaleelulla, Counterexamples in Topological Vector Spaces (Springer, Berlin, 1982)
A.B. Kharazishvili, Nonmeasurable Sets and Functions (Elsevier, Amsterdam, 2004)
M.J. Lighthill, An Introduction to Fourier Analysis and Generalised Functions (Cambridge Uni-
versity, Cambridge, 1980)
M.E. Munroe, Measure and Integration, 2nd edn. (Addison-Wesley, Reading, 1971)
I. Niven, H.S. Zuckerman, H.L. Montgomery, An Introduction to the Theory of Numbers, 5th edn.
(Wiley, New York, 1991)
J.M.H. Olmsted, Advanced Calculus (Appleton-Century-Crofts, New York, 1961)
S.R. Park, J. Bae, H. Kang, I. Song, On the polynomial representation for the number of partitions
with fixed length. Math. Comput. 77(262), 1135–1151 (2008)
J. Riordan, Combinatorial Identities (Wily, New York, 1968)
F.S. Roberts, B. Tesman, Applied Combinatorics, 2nd edn. (CRC, Boca Raton, 2009)
J.P. Romano, A.F. Siegel, Counterexamples in Probability and Statistics (Chapman and Hall, New
York, 1986)
K.H. Rosen, J.G. Michaels, J.L. Gross, J.W. Grossman, D.R. Shier, Handbook of Discrete and
Combinatorial Mathematics (CRC, New York, 2000)
H.L. Royden, Real Analysis, 3rd edn. (Macmillan, New York, 1989)
W. Rudin, Principles of Mathematical Analysis, 3rd edn. (McGraw-Hill, New York, 1976)
R. Salem, On some singular monotonic functions which are strictly increasing. Trans. Am. Math.
Soc. 53(3), 427–439 (1943)
A.N. Shiryaev, Probability, 2nd edn. (Springer, New York, 1996)
N.J.A. Sloane, S. Plouffe, Encyclopedia of Integer Sequences (Academic, San Diego, 1995)
D.M.Y. Sommerville, An Introduction to the Geometry of N Dimensions (Dover, New York, 1958)
R.P. Stanley, Enumerative Combinatorics, Vols. 1 and 2 (Cambridge University Press, Cambridge,
1997)
L.A. Steen, J.A. Seebach Jr., Counterexamples in Topology (Holt, Rinehart, and Winston, New
York, 1970)
J. Stewart, Calculus: Early Transcendentals, 7th edn. (Brooks/Coles, Belmont, 2012)
A.A. Sveshnikov (ed.), Problems in Probability Theory, Mathematical Statistics and Theory of
Random Functions (Dover, New York, 1968)
G.B. Thomas, Jr., R.L. Finney, Calculus and Analytic Geometry, 9th edn. (Addison-Wesley, Read-
ing, 1996)
J.B. Thomas, Introduction to Probability (Springer, New York, 1986)
A. Tucker, Applied Combinatorics (Wiley, New York, 2002)
N.Y. Vilenkin, Combinatorics (Academic, New York, 1971)
W.D. Wallis, J.C. George, Introduction to Combinatorics (CRC, New York, 2010)
A.I. Zayed, Handbook of Function and Generalized Function Transformations (CRC, Boca Raton,
1996)
S. Zhang, J. Jian, Computation of Special Functions (Wiley, New York, 1996)
Chapter 2
Fundamentals of Probability

Probability theory is a branch of measure theory. In measure theory and probability


theory, we consider set functions of which the values are non-negative real numbers
with the values called the measure and probability, respectively, of the corresponding
set. In probability theory (Ross 1976, 1996), the values are in addition normalized
to exist between 0 and 1: loosely speaking, the probability of a set represents the
weight or size of the set. The more common concepts such as area, weight, volume,
and mass are other examples of measure. As we shall see shortly, by integrating
probability density or by adding probability mass, we can obtain probability. This is
similar to obtaining mass by integrating the mass density or by summing point mass.

2.1 Algebra and Sigma Algebra

We first address the notions of algebra and sigma algebra (Bickel and Doksum 1977;
Leon-Garcia 2008), which are the bases in defining probability.

2.1.1 Algebra

Definition 2.1.1 (algebra) A collection A of subsets of a set S satisfying the two


conditions

if A ∈ A and B ∈ A, then A ∪ B ∈ A (2.1.1)

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 93


I. Song et al., Probability and Random Variables: Theory and Applications,
https://doi.org/10.1007/978-3-030-97679-8_2
94 2 Fundamentals of Probability

and

if A ∈ A, then Ac ∈ A (2.1.2)

is called an algebra of S.

Example 2.1.1 The collection A1 = {{1}, {2}, S1 , ∅} is an algebra of S1 = {1, 2},


and A2 = {{1}, {2, 3}, S2 , ∅} is an algebra of S2 = {1, 2, 3}. ♦

Example 2.1.2 From de Morgan’s law, (2.1.1), and (2.1.2), we get A ∩ B ∈ A when
A ∈ A and B ∈ A for an algebra A. Subsequently, we have
n
∪ Ai ∈ A (2.1.3)
i=1

and
n
∩ Ai ∈ A (2.1.4)
i=1

when {Ai }i=1


n
are all the elements of the algebra A. ♦

The theorem below follows from Example 2.1.2.

Theorem 2.1.1 An algebra is closed under a finite number of set operations.

When Ai ∈ A, we always have Ai ∩ Aic = ∅ ∈ A and Ai ∪ Aic = S ∈ A,


expressed as a theorem below.

Theorem 2.1.2 If A is an algebra of S, then S ∈ A and ∅ ∈ A.

In other words, a collection is not an algebra of S if the collection does not include
∅ or S.

Example 2.1.3 Obtain all the algebras of S = {1, 2, 3}.

Solution The collections A1 = {S, ∅}, A2 = {S, {1}, {2, 3}, ∅}, A3 = {S, {2},
{1, 3}, ∅}, A4 = {S, {3}, {1, 2}, ∅}, and A5 = {S, {1}, {2}, {3}, {1, 2}, {2, 3}, {1, 3},
∅} are the algebras of S. ♦

Example 2.1.4 Assume J+ = {1, 2, . . .} defined in (1.1.3), and consider the collec-
tion A1 of all the sets obtained from a finite number of unions of the sets {1}, {2}, . . .
each containing a single natural number. Now, J+ is not an element of A1 because
it is not possible to obtain J+ from a finite number of unions of the sets {1}, {2}, . . ..
Consequently, A1 is not an algebra of J+ . ♦
2.1 Algebra and Sigma Algebra 95

Definition 2.1.2 (generated algebra) For a collection C of subsets of a set, the small-
est algebra to which all the element sets in C belong is called the algebra generated
from C and is denoted by A (C).
The implication of A (C) being the smallest algebra is that any algebra to which
all the element sets of C belong also contains all the element sets of A (C) as its
elements.
Example 2.1.5 When S = {1, 2, 3}, the algebra generated from C = {{1}} is A (C) =
A1 = {S, {1}, {2, 3}, ∅} because A2 = {S, {1}, {2}, {3}, {1, 2}, {2, 3}, {1, 3}, ∅} con-
tains all the elements of A1 = {S, {1}, {2, 3}, ∅}. ♦
Example 2.1.6 For the collection C = {{a}} of S = {a, b, c, d}, the algebra gener-
ated from C is A (C) = {∅, {a}, {b, c, d}, S}. ♦

Theorem 2.1.3 Let A be an algebra of a set S, and {Ai }i=1 be a sequence of sets in

A. Then, A contains a sequence {Bi }i=1 of sets such that

Bm ∩ Bn = ∅ (2.1.5)

for m = n and
∞ ∞
∪ Bi = ∪ Ai . (2.1.6)
i=1 i=1

Proof The theorem can be proved similarly as in (1.1.32)–(1.1.34). ♠


Example 2.1.7 Assume the algebra A = {{1}, {2, 3}, S, ∅} of S = {1, 2, 3}. When
A1 = {1}, A2 = {2, 3}, and A3 = S, we have B1 = {1} and B2 = {2, 3}. ♦
Example 2.1.8 Consider the algebra {∅, {a}, {b}, {a, b}, {c, d}, {b, c, d}, {a, c, d},
S} of S = {a, b, c, d}. For A1 = {a, b} and A2 = {b, c, d}, we have B1 = {a, b} and
B2 = {c, d}, or B1 = {a} and B2 = {b, c, d}. ♦
Example 2.1.9 Assume the algebra considered in Example 2.1.8. If A1 = {a, b},
A2 = {b, c, d}, and A3 = {a, c, d}, then B1 = {a}, B2 = {b}, and B3 = {c, d}. ♦
Example 2.1.10 Assume the algebra A ({{1}, {2}, . . .}) generated from {{1},
{2}, . . .}, and let Ai = {2, 4, . . . , 2i}. Then, we have B1 = {2}, B2 = {4}, . . ., i.e.,
∞ ∞
Bi = {2i} for i = 1, 2, . . .. It is clear that ∪ Bi = ∪ Ai . ♦
i=1 i=1

2.1.2 Sigma Algebra

In some cases, the results in finite and infinite spaces are different. For example,
although the result from a finite number of set operations on the elements of an algebra
96 2 Fundamentals of Probability

is an element of the algebra, the result from an infinite number of set operations is
not always an element of the algebra. This is similar to the fact that adding a finite
number of rational numbers results always in a rational number while adding an
infinite number of rational numbers sometimes results in an irrational number.

Example 2.1.11 Assume S = {1, 2, . . .}, a collection C of finite subsets of S, and


the algebra A (C) generated from C. Then, S is an element of A (C) although S can
be obtained from only an infinite number of unions of the element sets in A (C). On
the other hand, the set {2, 4, . . .}, a set that can also be obtained from only an infinite
number of unions of the element sets in A (C), is not an element of A (C). In other
words, while a finite number of unions of the element sets in A (C) would result in
an element of A (C), an infinite number of unions of the element sets in A (C) is not
guaranteed to be an element of A (C). ♦

As it is clear from the example above, the algebra is unfortunately not closed
under a countable number of set operations. We now define the notion of σ -algebra
by adding one desirable property to algebra.

Definition 2.1.3 (σ -algebra) An algebra that is closed under a countable number of


unions is called a sigma algebra or σ -algebra.

In other words, an algebra F is a σ -algebra if



∪ Ai ∈ F (2.1.7)
i=1

for all element sets A1 , A2 , . . . of F. A sigma algebra is closed under a countable,


i.e., finite and countably infinite, number of set operations while an algebra is closed
under only a finite number of set operations. A sigma algebra is still an algebra, but
the converse is not necessarily true. An algebra and a σ -algebra are also called an
additive class of sets and a completely additive class of sets, respectively.

Example 2.1.12 For finite sets, an algebra is also a sigma algebra. ♦

Example 2.1.13 For a σ -algebra F of S, we always have S ∈ F and ∅ ∈ F from


Theorem 2.1.2 because σ -algebra is an algebra. ♦

Example 2.1.14 The collection

F = {∅, {a}, {b}, {c}, {d}, {a, b}, {a, c}, {a, d}, {b, c}, {b, d}, {c, d},
{a, b, c}, {a, b, d}, {a, c, d}, {b, c, d}, S} (2.1.8)

of sets from S = {a, b, c, d} is a σ -algebra. ♦

When the collection of all possible outcomes is finite as in a single toss of a coin
or a single rolling of a pair of dice, the limit, i.e., the infinite union in (2.1.7), does
not have significant implications and an algebra is also a sigma algebra. On the other
2.1 Algebra and Sigma Algebra 97

hand, when an algebra contains infinitely many element sets, the result of an infinite
number of unions of the element sets of the algebra does not always belong to the
algebra because an algebra is not closed under an infinite number of set operations.
Such a case occurs when the collection of all possible outcomes is from, for instance,
an infinite toss of a coin or an infinite rolling of a pair of dice.

Example 2.1.15 The space Υ = {a = (a1 , a2 , . . .) : ai ∈ {0, 1}} of one-sided binary


sequences is an uncountable set as discussed in Example 1.1.45. Consider the algebra

AΥ = A (G Υ ) (2.1.9)

generated from the collection G Υ = {{ai } : ai ∈ Υ } of singleton sets {ai }. Then,


some useful countably infinite sets such as

ΥT = {periodic binary sequences} (2.1.10)

described in Example 1.1.44 are not elements of the algebra AΥ because an infinite
set cannot be obtained by a finite number of set operations on the element sets of
GΥ . ♦

Example 2.1.16 Assuming Ω = J+ , consider the algebra A N = A (G) generated


from the collection G = {{1}, {2}, . . .}. Clearly, J+ ∈ A N and ∅ ∈ A N . On the other
hand, the set {2, 4, . . .} of even numbers is not an element of A N because the set of
even numbers cannot be obtained by a finite number of set operations on the element
sets of G, as we have already mentioned in Example 2.1.11. Therefore, A N is an
algebra, but is not a σ -algebra, of J+ . ♦

Example 2.1.17 Assuming Ω = Q, the set of rational numbers, let A N = A (G)


be the algebra generated from the collection G = {{1}, {2}, . . .} of singleton sets
of natural numbers. Clearly, A N is not a σ -algebra of Q. Note that the set J+ =
{1, 2, . . .} of natural numbers is not an element of A N because J+ cannot be obtained
by a finite number of set operations on the sets of G. ♦

Example 2.1.18 For Ω = [0, ∞), consider the collection

F1 = {[a, b), [a, ∞) : 0 ≤ a ≤ b < ∞} (2.1.11)

of intervals [a, b) and [a, ∞) with 0 ≤ a ≤ b < ∞, and the collection F2 obtained
from a finite number of unions of the intervals in F1 . We have [a, a) = ∅ ∈ F1 and
[a, ∞) = Ω ∈ F1 with a = 0. Yet, although [a, b) ∈ F1 for 0 ≤ a ≤ b < ∞, we
have [a, b)c = [0, a) ∪ [b, ∞) ∈/ F1 for 0 < a < b < ∞. Thus, F1 is not an algebra
of Ω. On the other hand, F2 is an algebra1 of Ω because a finite number of unions
of the elements in F1 is an element of F2 , the complement of every element in

1Here, if the condition ‘0 ≤ a ≤ b < ∞’ is replaced with ‘0 ≤ a < b < ∞’, F2 is not an algebra
because the null set is not an element of F2 .
98 2 Fundamentals of Probability

F2 is an element of F2 , ∅ ∈ F2 , and Ω ∈ F2 . However, F2 is not a σ -algebra of



Ω = [0, ∞) because ∩ An = {0}, for instance, is2 not an element of F2 although
  n=1
An = 0, n1 ∈ F2 for n = 1, 2, . . .. ♦
Example 2.1.19 Assuming Ω = R, consider the collection A of results obtained
from finite numbers of unions of intervals (−∞, a], (b, c], and (d, ∞) with b ≤ c.
∞  
Then, A is an algebra of R but is not a σ -algebra because ∩ b − n1 , c = [b, c] is
n=1
not an element of A. ♦


Example 2.1.20 Assume the σ -algebras {Fi }i=1 of a set Ω. Then, ∩ Fn is a
n=1

σ -algebra. However, ∪ Fn is not always a σ -algebra. For example, for Ω =
n=1
{ω1 , ω2 , ω3 }, consider the two σ -algebras F1 = {∅, {ω1 } , {ω2 , ω3 } , Ω} and F2 =
{∅, {ω2 } , {ω1 , ω3 } , Ω}. Then, F1 ∩ F2 = {∅, Ω} is a σ -algebra, but the collec-
tion F1 ∪ F2 = {∅, {ω1 } , {ω2 } , {ω2 , ω3 } , {ω1 , ω3 } , Ω} is not even an algebra. As
another example, consider the sequence

F1 = {∅, Ω, {ω1 } , Ω − {ω1 }} , (2.1.12)


F2 = {∅, Ω, {ω2 } , Ω − {ω2 }} , (2.1.13)
..
.

of sigma algebras of Ω = {ω1 , ω2 , . . .}. Then,



∪ Fn = {∅, Ω, {ω1 } , {ω2 } , . . . , Ω − {ω1 } , Ω − {ω2 } , . . .} (2.1.14)
n=1

is not an algebra. ♦
Definition 2.1.4 (generated σ -algebra) Consider a collection G of subsets of Ω.
The smallest σ -algebra that contains all the element sets of G is called the σ -algebra
generated from G and is denoted by σ (G).
The implication of the σ -algebra σ (G) being the smallest σ -algebra is that any
σ -algebra which contains all the elements of C will also contain all the elements of
σ (G).
Example 2.1.21 For S = {a, b, c, d}, the σ -algebra generated from C = {{a}} is
σ (C) = {∅, {a}, {b, c, d}, S}. ♦
Example 2.1.22 For the uncountable set Υ = {a = (a1 , a2 , . . .) : ai ∈ {0, 1}} of
one-sided binary sequences, consider the algebra A (G Υ ) and σ -algebra σ (G Υ ) gen-
erated from G Υ = {{ai } : ai ∈ Υ }. Then, as we have observed in Example 2.1.15,
the collection

2 This result is from (1.5.11).


2.1 Algebra and Sigma Algebra 99

ΥT = {periodic binary sequences} (2.1.15)

is not included in A (G Υ ) because all the element sets of A (G Υ ) contain a finite


number of ai ’s while ΥT is an infinite set. On the other hand, we have

ΥT ∈ σ (G Υ ) (2.1.16)

by the definition of a sigma algebra. ♦


Based on the concept of σ -algebra, we will discuss the notion of probability
space in the next section. In particular, the concept of σ -algebra plays a key role in
the continuous probability space.

2.2 Probability Spaces

A probability space (Gray and Davisson 2010; Loeve 1977) is the triplet (Ω, F, P)
of an abstract space Ω, called the sample space; a sigma algebra F, called the event
space, of the sample space; and a set function P, called the probability measure,
assigning a number in [0, 1] to each of the element sets of the event space.

2.2.1 Sample Space

Definition 2.2.1 (random experiment) An experiment that can be repeated under


perfect control, yet the outcome of which is not known in advance, is called a random
experiment or, simply, an experiment.
Example 2.2.1 Tossing a coin is a random experiment because it is not possible to
predict the exact outcome even under a perfect control of the environment. ♦
Example 2.2.2 Making a product in a factory can be modelled as a random experi-
ment because even the same machine would not be able to produce two same products.

Example 2.2.3 Although the law of inheritance is known, it is not possible to know
exactly, for instance, the color of eyes of a baby in advance. A probabilistic model
is more appropriate. ♦
Example 2.2.4 In any random experiment, the procedure, observation, and model
should be described clearly. For example, toss a coin can be described as follows:
• Procedure. A coin will be thrown upward and fall freely down to the floor.
• Observation. When the coin stops moving, the face upward is observed.
• Model. The coin is symmetric and previous outcomes do not influence future
outcomes. ♦
100 2 Fundamentals of Probability

Definition 2.2.2 (sample space) The collection of all possible outcomes of an exper-
iment is called the sample space of the experiment.
The sample space, often denoted by S or Ω, is basically the same as the abstract
space in set theory.
Definition 2.2.3 (sample point) An element of the sample space is called a sample
point or an elementary outcome.
Example 2.2.5 In toss a coin, the sample space is S = {head, tail} and the sample
points are head and tail. In rolling a fair die, the sample space is S = {1, 2, 3, 4, 5, 6}
and the sample points are 1, 2, . . . , 6. In the experiment of rolling a die until a certain
number appears, the sample space is S = {1, 2, . . .} and the sample points are 1, 2, . . .
when the observation is the number of rolling. ♦
Example 2.2.6 In the experiment of choosing a real number between a and b ran-
domly, the sample space is Ω = (a, b). ♦
The sample spaces in Example 2.2.5 are countable sets, which are often called
discrete sample spaces or discrete spaces. The sample space Ω = (a, b) considered
in Example 2.2.6 is an uncountable space and is called a continuous sample space
or continuous space.
A finite dimensional vector space from a discrete space is, again, a discrete space:
on the other hand, it should be noted that an infinite dimensional vector space from
a discrete space, which is called a sequence space, is a continuous space. A mixture
of discrete and continuous spaces is called a mixed sample space or a hybrid sample
space.
Let us generally denote by I the index set such as the set R of real numbers, the
set

R0 = {x : x ≥ 0} (2.2.1)

of non-negative real numbers, the set

Jk = {0, 1, . . . , k − 1} (2.2.2)

of integers from 0 to k − 1, the set J+ = {1, 2, . . .} of naturalnumbers, and the


set J = {. . . , −1, 0, 1, . . .} of integers. Then, the product space Ωt of the sample
t∈I
spaces {Ωt , t ∈ I} can be described as

Ωt = {all {at , t ∈ I} : at ∈ Ωt } , (2.2.3)
t∈I

which can also be written as Ω I if it incurs no confusion or if Ωt = Ω.


Definition 2.2.4 (discrete combined space) For combined random experiments on
two discrete sample spaces Ω1 of size m and Ω2 of size n, the sample space Ω =
Ω1 × Ω2 of size mn is called a discrete combined space.
2.2 Probability Spaces 101

Example 2.2.7 When a coin is tossed and a die is rolled at the same time, the sam-
ple space is the discrete combined space S = {(head, 1), (head, 2), . . . , (head, 6),
(tail, 1), (tail, 2), . . . , (tail, 6)} of size 2 × 6 = 12. ♦

Example 2.2.8 When two coins are tossed once or a coin is tossed twice, the sample
space is the discrete combined space S = {(head, head), (head, tail), (tail, head),
(tail, tail)} of size 2 × 2 = 4. ♦

Example 2.2.9 Assume the sample space Ω = {0, 1} and let Ω1 = Ω2 = · · · =



k
Ωk = Ω. Then, the space Ωi = Ω k = {(a1 , a2 , . . . , ak ) : ai ∈ {0, 1}} is the k-
i=1
fold Cartesian space of Ω, a space of binary vectors of length k, and an example of
a product space. ♦

Example 2.2.10 Assume the discrete space Ω = {a1 , a2 , . . . , am }. The space


Ω k = {all vectors b = (b1 , b2 , . . . , bk ) : bi ∈ Ω} of k-dimensional vectors
from Ω, i.e., the k-fold Cartesian space of Ω, is a discrete space as we have
already observed indirectly in Examples 1.1.41 and 1.1.42. On the other hand, the
sequence spaces Ω J = {all infinite sequences {. . . , c−1 , c0 , c1 , . . .} : ci ∈ Ω} and
Ω J+ = {all infinite sequences {d1 , d2 , . . .} : di ∈ Ω} are continuous spaces,
although they are derived from a discrete space. ♦

2.2.2 Event Space

Definition 2.2.5 (event space; event) A sigma algebra obtained from a sample space
is called an event space, and an element of the event space is called an event.

An event in probability theory is roughly another name for a set in set theory.
Nonetheless, a non-measurable set discussed in Appendix 2.4 cannot be an event
and not all measurable sets are events: again, only the element sets of an event space
are events.

Example 2.2.11 Consider the sample space S = {1, 2, 3} and the event space C =
{{1}, {2, 3}, S, ∅}. Then, the subsets {1}, {2, 3}, S, and ∅ of S are events. However,
the other subsets {2}, {3}, {1, 3}, and {1, 2} of S are not events. ♦

As we can easily observe in Example 2.2.11, every event is a subset of the sample
space, but not all the subsets of a sample space are events.

Definition 2.2.6 (elementary event) An event that is a singleton set is called an


elementary event.

Example 2.2.12 For a coin toss, the sample space is S = {head, tail}. If we assume
the event space F = {S, ∅, {head}, {tail}}, then the sets S, {head}, {tail}, and ∅ are
events, among which {head} and {tail} are elementary events. ♦
102 2 Fundamentals of Probability

It should be noted that only a set can be an event. Therefore, no element of a


sample space can be an event, and only a subset of a sample space can possibly be
an event.

Example 2.2.13 For rolling a die with the sample space S = {1, 2, 3, 4, 5, 6}, the
element 1, for example, of S can never be an event. The subset {1, 2, 3} may
sometimes be an event: specifically, the subset {1, 2, 3} is and is not an event for
the event spaces F = {{1, 2, 3}, {4, 5, 6}, S, ∅} and F = {{1, 2}, {3, 4, 5, 6}, S, ∅},
respectively. ♦

In addition, when a set is an event, the complement of the set is also an event even
if it cannot happen.

Example 2.2.14 Consider an experiment of measuring the current through a circuit


in the sample space R. If the set {a : a ≤ 1000 mA} is an event, then the complement
{a : a > 1000 mA} will also be an event even if the current through the circuit cannot
exceed 1000 mA. ♦

For a sample space, several event spaces may exist. For a sample space Ω, the
collection {∅, Ω} is the smallest event space and the power set 2Ω , the collection of
all the subsets of Ω as described in Definition 1.1.13, is the largest event space.

Example 2.2.15 For a game of rock-paper-scissors with the sample space Ω =


{rock, paper, scissors}, the smallest event space is F S = {∅, Ω} and the largest event
space is F L = 2Ω = {∅, {rock}, {paper}, {scissors}, {rock, paper}, {paper,
scissors}, {scissors, rock}, Ω}. ♦

Let us now briefly describe why we base the probability theory on the more
restrictive σ -algebra, and not on algebra. As we have noted before, when the sample
space is finite, we could also have based the probability theory on algebra because
an algebra is basically the same as a σ -algebra. However, if the probability theory
is based on algebra when the sample space is an infinite set, it becomes impossible
to take some useful sets3 as events and to consider the limit of events, as we can see
from the examples below.

Example 2.2.16 If the event space is defined not as a σ -algebra but as an algebra,
then the limit is not an element of the event space for some sequences of events. For
example, even when all finite intervals (a, b) are events, no singleton set is an event
because a singleton set cannot be obtained from a finite number of set operations
on intervals (a, b). In a more practical scenario, even if “The voltage measured is
between a and b (V).” is an event, “The voltage measured is a (V).” would not be
an event if the event space were defined as an algebra. ♦

3Among those useful sets is the set ΥT = {all periodic binary sequences} considered in Example
2.1.15.
2.2 Probability Spaces 103

As we have already mentioned in Sect. 2.1.2, when the sample space is finite, the
limit is not so crucial and an algebra is a σ -algebra. Consequently, the probability
theory could also be based on algebra. When the sample space is an infinite set and
the event space is composed of an infinite number of sets, however, an algebra is
not closed under an infinite number of set operations. In such a case, the result of an
infinite number of operations on sets, i.e., the limit of a sequence, is not guaranteed
to be an event. In short, the fact that a σ -algebra is closed under a countable number
of set operations is the reason why we adopt the more restrictive σ -algebra as an
event space, and not the more general algebra.
An event space is a collection of subsets of the sample space closed under a
countable number of unions. We can show, for instance via de Morgan’s law, that an
event space is closed also under a countable number of other set operations such as
difference, complement, intersection, etc. It should be noted that an event space is
closed for a countable number of set operations, but not for an uncountable number

of set operations. For example, the set ∪ Hr is also an event when {Hr }r∞=1 are all
r =1
events, but the set

∪ Br (2.2.4)
r ∈[0,1]

may or may not be an event when {Br : r ∈ [0, 1]} are all events.
Let us now discuss in some detail the condition

∪ Bi ∈ F (2.2.5)
i=1

shown originally in (2.1.7), where {Bn }∞n=1 are all elements of the event space F.
This condition implies that the event space is closed under a countable number of
union operations. Recollect that the limit lim Bn of {Bn }∞
n=1 is defined as
n→∞




⎨ ∪ Bi , {Bn }∞
n=1 is a non-decreasing sequence,
lim Bn = i=1
∞ (2.2.6)
n→∞ ⎪
⎩ ∩ Bi , {Bn }∞
n=1 is a non-increasing sequence
i=1

in (1.5.8) and (1.5.9). It is clear that a sequence {Bn }∞


n=1 of events in F will also
satisfy

lim Bn ∈ F, {Bn }∞
n=1 is a non-decreasing sequence (2.2.7)
n→∞

when the sequence satisfies (2.2.5).

Example 2.2.17 Let a sequence {Hn }∞n=1 of events in an event space F be non-
decreasing. As we have seen in (1.5.8) and (2.2.6), the limit of {Hn }∞
n=1 can be

expressed in terms of the countable union as lim Hn = ∪ Hn . Because a countable
n→∞ n=1
104 2 Fundamentals of Probability

number of unions of events results in an event, the limit lim Hn is an event. For
  ∞ n→∞
example, when 1, 2 − n1 n=1 are all events, the limit [1, 2) of this non-decreasing
sequence of events is an event. In addition, when finite intervals of the form (a, b)
are all events, the limit (−∞, b) of {(−n, b)}∞n=1 will also be an event. Similarly,
assume a non-increasing sequence {Bn }∞n=1 of events in F. The limit of this sequence,

lim Bn = ∩ Bn as shown in (1.5.9) or (2.2.6), will also be an event because it is
n→∞ n=1
a countable intersection of events. Therefore,
  ∞ intervals (a, b) are all events,
if finite
any singleton set {a}, the limit of a − n1 , a + n1 n=1 , will also be an event. ♦
Example 2.2.18 Let us show the equivalence of (2.2.5) and (2.2.7). We have already
observed that (2.2.7) holds true for a collection of events satisfying (2.2.5). Let us
thus show that (2.2.5) holds true for a collection of events satisfying (2.2.7). Con-
n

sider a sequence {G i }i=1 of events chosen arbitrarily in F and let Hn = ∪ G i . Then,
i=1
∞ ∞
∪ G n = ∪ Hn and {Hn }∞
n=1 is a non-decreasing sequence. In addition, because
n=1 n=1

{Hn }∞
n=1 is a non-decreasing sequence, we have ∪ Hi = lim Hn from (2.2.6). There-
i=1 n→∞
∞ ∞

fore, ∪ G n = ∪ Hn = lim Hn ∈ F. In other words, for any sequence {G i }i=1 of
n=1 n=1 n→∞
events in F satisfying (2.2.7), we have

∪ G n ∈ F. (2.2.8)
n=1

In essence, the two conditions (2.2.5) and (2.2.7) are equivalent, which implies that
(2.2.7), instead of (2.2.5), can be employed as one of the requirements for a collection
to be an event space. Similarly, instead of (2.2.5),

lim Bn ∈ F, {Bn }∞
n=1 is a non-increasing sequence (2.2.9)
n→∞

can be employed as one of the requirements for a collection to be an event space. ♦


When the sample space is the space of real numbers, the representative of con-
tinuous spaces, we now consider a useful event space based on the notion of the
smallest σ -algebra described in Definition 2.1.4.
Definition 2.2.7 (Borel σ -algebra) When the sample space is the real line R, the
sigma algebra

B (R) = σ (all open intervals) (2.2.10)

generated from all open intervals (a, b) in R is called the Borel algebra, Borel sigma
field, or Borel field of R.
The members of the Borel field, i.e., the sets obtained from a countable number
of set operations on open intervals, are called Borel sets.
2.2 Probability Spaces 105

∞  
Example 2.2.19 It is possible to see that singleton sets {x} = ∩ x − n1 , x + n1 ,
n=1
half-open intervals [x, y) = (x, y) ∪ {x} and (x, y] = (x, y) ∪ {y}, and closed inter-
vals [x, y] = (x, y) ∪ {x} ∪ {y} are all Borel sets after some set operations. In addi-
tion, half-open intervals [x, +∞) = (−∞, x)c and (−∞, x] = (−∞, x) ∪ {x}, and
open intervals (x, ∞) = (−∞, x]c are also Borel sets. ♦

The Borel σ -algebra B (R) is the most useful and widely-used σ -algebra on the
set of real numbers, and contains all finite and infinite open, closed, and half-open
intervals, singleton sets, and the results from set operations on these sets. On the other
hand, the Borel σ -algebra B (R) is different from the collection of all subsets of R.
In other words, there exist some subsets of real numbers which are not contained in
the Borel σ -algebra. One such example is the Vitali set discussed in Appendix 2.4.
When the sample space is the real line R, we choose the Borel σ -algebra B (R) as
our event space. At the same time, when a subset Ω of real numbers is the sample
space, the Borel σ -algebra
 
B Ω = G : G = H ∩ Ω , H ∈ B (R) (2.2.11)

of Ω is assumed as the event space. Note that when the sample space is a discrete
subset A of the set of real numbers, Borel σ -algebra B(A) of A is the same as the
power set 2 A of A.

2.2.3 Probability Measure

We now consider the notion of probability measure, the third element of a probability
space.

Definition 2.2.8 (measurable space) The pair (Ω, F) of a sample space Ω and an
event space F is called a measurable space.

Let us again mention that when the sample space S is countable or discrete,
we usually assume the power set of the
 sample space as the event space: in other
words, the measurable space is S, 2 S . When the sample space S is uncountable or
continuous, we assume the event space described by (2.2.11): in other words, the
measurable space is (S, B(S)).

Definition 2.2.9 (probability measure) On a measurable space (Ω, F), a set func-
tion P assigning a real number P (Bi ) to each set Bi ∈ F under the constraint of the
following four axioms is called a probability measure or simply probability:
Axiom 1.

P (Bi ) ≥ 0. (2.2.12)
106 2 Fundamentals of Probability

Axiom 2.

P(Ω) = 1. (2.2.13)

Axiom 3. When a finite number of events {Bi }i=1


n
are mutually exclusive,
  
n
n
P ∪ Bi = P (Bi ) . (2.2.14)
i=1
i=1


Axiom 4. When a countable number of events {Bi }i=1 are mutually exclusive,
  ∞


P ∪ Bi = P (Bi ) . (2.2.15)
i=1
i=1

The probability measure P is a set function assigning a value P(G), called prob-
ability and also denoted by P{G} and Pr{G}, to an event G. A probability measure
is also called a probability function, probability distribution, or distribution.
Axioms 1–4 are also intuitively appealing. The first axiom that a probability
should be not smaller than 0 is in some sense chosen arbitrarily like other measures
such as area, volume, and weight. The second axiom is a mathematical expression
that something happens from an experiment or some outcome will result from an
experiment. The third axiom is called additivity or finite additivity, and implies
that the probability of the union of events with no common element is the sum of
the probability of each event, which is similar to the case of adding areas of non-
overlapping regions.
Axiom 4 is called the countable additivity, which is an asymptotic generalization
of Axiom 3 into the limit. This axiom is the key that differentiates the modern
probability theory developed by Kolmogorov from the elementary probability theory.
When evaluating the probability of an event which can be expressed, for example,
only by the limit of events, Axiom 4 is crucial: such an asymptotic procedure is similar
to obtaining the integral as the limit of a series. It should be noted that (2.2.14) does
not guarantee (2.2.15). In some cases, (2.2.14) is combined into (2.2.15) by viewing
Axiom 3 as a special case of Axiom 4.
If our definition of probability is based on the space of an algebra, then we may not
be able to describe, for example, some probability resulting from a countably infinite
number of set operations. To guarantee the existence of the probability in such a case
as well, we need sigma algebra which guarantees the result of a countably infinite
number of set operations to exist within our space.
From the axioms of probability, we can obtain

P (∅) = 0, (2.2.16)
 
P B c = 1 − P(B), (2.2.17)
2.2 Probability Spaces 107

and

P(B) ≤ 1, (2.2.18)

where B is an event. In addition, based on the axioms of probability and (2.2.16),


the only probability measure is P(Ω) = 1 and P(∅) = 0 in the measurable space
(Ω, F) with event space F = {Ω, ∅}.
Example 2.2.20 In a fair4 coin toss, consider the sample space Ω = {head, tail}
and event space F = 2Ω . Then, we have5 P(head) = P(tail) = 21 , P(Ω) = 1, and
P(∅) = 0. ♦
Example 2.2.21 Consider the sample space Ω = {0, 1} and event space F =
{{0}, {1}, Ω, ∅}. Then, an example of the probability measure on this measurable
space (Ω, F) is P(0) = 10 3
, P(1) = 10
7
, P (∅) = 0, and P (Ω) = 1. ♦
In a discrete sample space, the power set of the sample space is assumed to
be the event space and the event space may contain all the subsets of the discrete
sample space. On the other hand, if we choose the event space containing all the
possible subsets of a continuous sample space, not only will we be confronted with
contradictions6 but the event space will be too large to be useful. Therefore, when
dealing with continuous sample spaces, we choose an appropriate event space such
as the Borel sigma algebra mentioned in Definition 2.2.7 and assign a probability
only to those events. In addition, in discrete sample spaces, assigning probability to
a singleton set is meaningful whereas it is not useful in continuous sample spaces.
Example 2.2.22 Consider a random experiment of choosing a real number between
0 and 1 randomly. If we assign the probability to a number, then the probability of
choosing any number will be zero because there are uncountably many real numbers
between 0 and 1. However, it is not possible to obtain the probability of, for instance,
‘the number chosen is between 0.1 and 0.5’ when the probability of choosing a
number is 0. In essence, in order to have a useful model in continuous sample space,
we should assign probabilities to interval sets, not to singleton sets. ♦
Definition 2.2.10 (probability space) The triplet (Ω, F, P) of a sample space Ω,
an event space F of Ω, and a probability measure P is called a probability space.
It is clear from Definition 2.2.10 that the event space F can be chosen as an algebra
instead of a σ -algebra when the sample space S is finite, because an algebra is the
same as a σ -algebra in finite sample spaces.

4 Here, fair means ‘head and tail are equally likely to occur’.
5 Because the probability measure P is a set function, P({k}) and P({head}), for instance, are the
exact expressions. Nonetheless, the expressions P(k), P{k}, P(head), and P{head} are also used.
6 The Vitali set V discussed in Definition 2.A.12 is a subset in the space of real numbers. Denote
0

the rational numbers in (−1, 1) by {αi }i=1 and assume the translation operation Tt (x) = x + t.

Then, the events Tαi V0 i=1 will produce a contradiction.
108 2 Fundamentals of Probability

2.3 Probability

We now discuss the properties of probability and alternative definitions of probability


(Gut 1995; Mills 2001).

2.3.1 Properties of Probability



Assume a probability space (Ω, F, P) and events {Bi ∈ F}i=1 . Then, based on
(2.2.12)–(2.2.15), we can obtain the following properties:
Property 1. We have
 
P (Bi ) ≤ P B j if Bi ⊆ B j (2.3.1)

and7
 
∞ 

P ∪ Bi ≤ P (Bi ) . (2.3.2)
i=1 i=1


Property 2. If {Bi }i=1 is a countable partition of the sample space Ω, then



P(G) = P (G ∩ Bi ) (2.3.3)
i=1

for G ∈ F. 
Property 3. Denoting the sum over nr = r !(n−r n!
ways of choosing r events from
   )!
{Bi }i=1 by
n
P Bi1 Bi2 · · · Bir , we have
i 1 <i 2 <...<ir

   
n
n
 
P ∪ Bi = (−1) 0
P (Bi ) + (−1)1 P Bi1 Bi2
i=1
i=1 i 1 <i 2
  
r −1
+ · · · + (−1) P Bi1 Bi2 · · · Bir
i 1 <i 2 <···<ir

+ · · · + (−1)n−1 P (B1 B2 · · · Bn ) . (2.3.4)

Figure 2.1 illustrates (2.3.3). We can rewrite (2.3.4) as

P (B1 ∪ B2 ) = P (B1 ) + P (B2 ) − P (B1 ∩ B2 ) (2.3.5)

7 Among these properties, (2.3.2) is called the Boole inequality. The Boole inequality (2.3.2) can
also be written into two formulas similarly to Axioms 3 and 4 in Definition 2.2.9.
2.3 Probability 109

Fig. 2.1 A property Ω




P(G) = P (G ∩ Bi ) of B1 B2 B3 B4
i=1 G
probability
G ∩ B1 G ∩ B2 G ∩ B3 G ∩ B4

and

P (B1 ∪ B2 ∪ B3 ) = P (B1 ) + P (B2 ) + P (B3 )


− P (B1 ∩ B2 ) − P (B2 ∩ B3 ) − P (B3 ∩ B1 )
+ P (B1 ∩ B2 ∩ B3 ) (2.3.6)

more specifically when n = 2 and 3, respectively.


Example 2.3.1 When P(A ∪ B) = 0.7, P(A) = 0.5, and P(B) = 0.3 in a proba-
bility space, obtain P(A ∩ B).
Solution We get P(A ∩ B) = P(A) + P(B) − P(A ∪ B) = 0.1 from (2.3.5). ♦
Example 2.3.2 Consider a random experiment of toss a fair coin three times. Obtain
the probability p1 that head occurs at least once.
Solution There exist 23 = 8 elementary events with equal probability 18 . Therefore,

p1 = P({head, head, head}) + P({head, head, tail}) + P({head, tail, head})


+ P({tail, head, head}) + P({head, tail, tail}) + P({tail, head, tail})
+ P({tail, tail, head})
7
= , (2.3.7)
8

which can also be obtained as p1 = 1 − 18 = 78 by noting that the event ‘head occurs
at least once’ is the complementary event of ‘tail occurs three times’. ♦
Note that probability being zero for an event does not necessarily mean that the
event does not occur or, equivalently, that the event is the same as the ‘impossible’
event ∅. For example, in the space of real numbers, although P({a}) = 0 for any
value of a, the event {a} is different from ∅. In general, A and B are called the same
in probability when P(A) = P(B). Similarly, when

P(A) = P(B)
= P(AB) (2.3.8)
110 2 Fundamentals of Probability

or P((A + B) − AB) = P ( AB c + Ac B) = 0, i.e., when

P(AΔB) = 0, (2.3.9)

A and B are called the same with probability 1, in which8 ‘with probability 1’ can
be replaced by ‘almost everywhere (a.e.)’, ‘almost always’, ‘almost certainly (a.c.)’,
‘almost surely (a.s.)’, and ‘almost every point’. For example, when P(A) = 0, A is the
same as ∅ almost surely because P(∅) = P(A) = P(A ∩ ∅) = 0. When P(A) = 1,
A is the same as Ω almost surely because P(Ω) = P(A) = P(A ∩ Ω) = 1. Note
that A being the same as B almost surely does not necessarily mean A = B.

Example 2.3.3 For the sample space Ω = [0, 1], let the probability of an inter-
val be the length of the interval. Consider the four intervals A1 = [0.1,  0.2],

A2 = [0.1,
 0.2), A 3 = (0.1, 0.2], and A 4 = (0.1, 0.2). Although P ( A i ) = P Aj =
P Ai A j = 0.1 for any i and j, it is clear that Ai = A j for i = j. In other words,
when i = j, Ai and A j are the same in probability and with probability 1, but they
are not the same event. In addition, A1 and B = [0.3, 0.4] are the same in probability
because P ( A1 ) = P(B) = 0.1, but they are neither the same with probability 1 nor
the same. ♦

Theorem 2.3.1 For events {Ai }i=1


n
in a probability space, we have9


n 
n 
n   
n
n
P (Ai ) − P ( Ai ∩ A k ) ≤ P ∪ Ai ≤ P (Ai ) . (2.3.10)
i=1
i=1 i=1 k=i+1 i=1

Proof If we let
 n
Ai − ∪ Ak , i = 1, 2, . . . , n − 1,
Bi = k=i+1 (2.3.11)
An , i = n,
   
n n n n
then we have P ∪ Ai =P ∪ Bi from ∪ Bi = ∪ Ai and, subsequently,
i=1 i=1 i=1 i=1

  
n
n
P ∪ Ai = P (Bi ) (2.3.12)
i=1
i=1

because Bi ∩ Bk = ∅ for i = k. We also have

P (Bi ) ≤ P (Ai ) (2.3.13)

8 This has been described also in Definition 1.1.24.


9 This formula is called the Bonferroni inequality.
2.3 Probability 111

from Bi ⊆ Ai . Combining (2.3.12) and (2.3.13), we get


  
n
n
P ∪ Ai ≤ P ( Ai ) . (2.3.14)
i=1
i=1

Next, recollect that P(C − D) = P(C) − P(C D) for two sets C and D from
P(C) = P(C
 − D)  + P(C   {C − D, C D}is a partition of 
D) because C. Then, 
not-
n n n
ing that P Ai ∩ ∪ Ak =P
∪ (Ai ∩ Ak ) because Ai ∩ ∪ Ak =
k=i+1
 k=i+1
 k=i+1
n n
∪ (Ai ∩ Ak ) from (1.1.22), we have P (Bi ) = P Ai − ∪ Ak = P (Ai ) −

k=i+1
  k=i+1
n
P Ai ∩ ∪ Ak , i.e.,
k=i+1

 
n
P (Bi ) = P (Ai ) − P ∪ (Ai ∩ Ak ) (2.3.15)
k=i+1

based on (2.3.11). We then get


n  
n
P ( Ai ) − P ( Ai ∩ A k ) ≤ P ( Ai ) − P ∪ (Ai ∩ Ak )
k=i+1
k=i+1
= P (Bi ) (2.3.16)
 
n 
n
from (2.3.15) because P ∪ (Ai ∩ Ak ) ≤ P ( Ai ∩ Ak ). Now, from (2.3.12)
k=i+1 k=i+1
and (2.3.16), we get


n 
n 
n 
n
P (Ai ) − P (Ai ∩ Ak ) ≤ P (Bi )
i=1 i=1 k=i+1 i=1
 
n
=P ∪ Ai , (2.3.17)
i=1

which, when combined with (2.3.14), confirms (2.3.10). ♠

Example 2.3.4 Assume the sample space S = {1, 2, . . . , 10}, event space 2 S , and
probability measure P(k) = 55 k
for k = 1, 2, . . . , 10. Consider the three events A1 =
 
3
{1, 2, 3}, A2 = {3, 4, 5, 6}, and A3 = {5, 6, 7, 8}. First, we have P ∪ Ai = 36 55
.
i=1

3
We also get P ( Ai ) = 50
55
from P (A1 ) = 6
55
, P ( A2 ) = 18
55
, and P ( A3 ) = 26
55
.
i=1
Finally, from P ( A1 ∩ A2 ) = 3
55
, P ( A2 ∩ A3 ) = 11
55
, and P (A3 ∩ A1 ) = 0, we have
112 2 Fundamentals of Probability


3 
3 
3 3 
P ( Ai ) − P ( Ai ∩ A k ) = 50
55
− 55
+ 11
55
= 36
55
. In short, (2.3.10) is con-
i=1 i=1 k=i+1
firmed. ♦

2.3.2 Other Definitions of Probability

Probability can be defined in several ways. In modern probability theory, probability


is defined with the four axioms by adopting the notion of σ -algebra as we have
described so far. We now introduce two other ways of defining probability.

2.3.2.1 Classical Definition

When all the outcomes from a random experiment are equally likely, the probability
of an event can be defined by the ratio of the number of desired outcomes to the total
number of outcomes. Specifically, the probability of A is given by the ratio

NA
P(A) = , (2.3.18)
N

where N is the number10 of all possible outcomes and N A is the number of desired
outcomes for A. The condition of equally likely occurrence is the key in the classical
definition.
Example 2.3.5 Obtain the probability of head when a fair coin is tossed once.
Solution There are two equally likely possible outcomes head and tail, among which
the desired outcome is head. Thus, the probability of head is 21 . ♦
Example 2.3.6 Obtain the probability P3 that the three pieces are all longer than 41
when a rod of length 1 is divided into three pieces by choosing two points at random.
Solution View the rod of length 1 as the interval [0, 1] on the real line, and let the
coordinate of the two points of cutting be x and y with 0 < x < 1 and 0 < y < 1 as
shown in Fig. 2.2. The three pieces will all be longer than 41 when x < y, 41 < x < 21 ,
and x + 41 < y < 43 or when x > y, 41 < y < 21 , and y + 41 < x < 43 . Therefore,
from
area of the region of desired outcomes
P3 = , (2.3.19)
area of the whole region

A1 +A2
 21  3  21  3
we get P3 = AT
= 1
4
x+ 14
1 dy dx + 1
4
y+ 41
1 d x d y, i.e.,
4 4

10 Note that the number is sometimes replaced by other quantity such as area, volume, or length as
it is shown in Example 2.3.6.
2.3 Probability 113

Fig. 2.2 Points of division: x y


0 1
0 < x, y < 1

Fig. 2.3 Dividing a rod of y


length 1 into three pieces all 1
longer than 41
3
4
1
2
1
4

1 1 3 1 x
4 2 4

1
P3 = (2.3.20)
16
referring to Fig. 2.3, where A T = 1 denotes the area of the whole region {0 <
x < 1, 0 < y < 1}, A1 is the area of the region x < y, 41 < x < 21 , x + 41 < y <
3
4
, and A2 is the area of the region x > y, y + 41 < x < 43 , 41 < y < 21 . ♦

Example 2.3.7 Obtain the probability PT that the three pieces will form a triangle
when a rod is divided into three pieces by choosing two points at random.

Solution Let the length of the rod be 1, and follow the description in
Example 2.3.6. Then, the lengths of the three pieces are x, y − x, and 1 − y when
x < y and y, x − y, and 1 − x when x > y. Thus, for the three pieces to form a tri-
angle, it is required that 0 < x < 1, 0 < y < 1, x < y, y > 21 , y < x + 21 , x < 21
or 0 < x < 1, 0 < y < 1, x > y, x > 21 , x < y + 21 , y < 21 because the sum of
the lengths of two pieces should be longer than the length of the remaining piece.
Consequently, from

area of the region of desired outcomes


PT = , (2.3.21)
area of total region
A1 +A2
we get PT = AT
, i.e.,

1
PT = , (2.3.22)
4

where A T = 1 is the area of the region {0 < x < 1, 0 < y < 1}, A1 = 18 is the area
of the region 0 < x < 1, 0 < y < 1, x < y, y > 21 , y < x + 21 , x < 21 , and A2 =
1
8
is the area of the region 0 < x < 1, 0 < y < 1, x > y, x > 21 , x < y + 21 , y <
1
2
. A similar problem will be discussed in Example 3.4.2. ♦
114 2 Fundamentals of Probability

Fig. 2.4 Bertrand’s paradox.


B
Solution 1
r
M 2
A

Example 2.3.8 Assume a collection of n integrated circuits (ICs), among which m


are defective ones. When an IC is chosen randomly from the collection, obtain the
probability α1 that the IC is defective.
Solution The number of ways to choose one IC among n ICs is n C1 and to choose
C1 = m .
one among m defective ICs is m C1 . Thus, α1 = mC ♦
n n 1

In obtaining the probability with the classical definition, it is important to enumer-


ate the number of desired outcomes correctly, which is usually a problem of combi-
natorics. Another important consideration in the classical definition is the condition
of equally likely outcomes, which will become more evident in Example 2.3.9.
Example 2.3.9 When a fair11 die is rolled twice, obtain the probability P7 that the
sum of the two faces is 7.
Solution (Solution 1) There are 11 possible cases {2, 3, . . . , 12} for the sum. There-
fore, one might say that P7 = 11 1
.
(Solution 2) If we write the outcomes of the two rolls as {x, y}, x ≥ y ignoring the
order, then we have 21 possible cases {1, 1}, {2, 1}, {2, 2}, . . ., {6, 6}. Among the
21 cases, the three possibilities {4, 3}, {5, 2}, {6, 1} yield a sum of 7. Based on this
observation, P7 = 21 3
= 17 .
(Solution 3) There are 36 equally likely outcomes (x, y) with x, y = 1, 2, . . . 6,
among which the six outcomes (1, 6), (2, 5), (3, 4), (4, 3), (5, 2), and (6, 1) yield a
sum of 7. Therefore, P7 = 36 6
= 16 .
Among the three solutions, neither the first nor the second is correct because the
cases in these two solutions are not equally likely. For instance, in the first solution,
the sum of two numbers being 2 and 3 are not equally likely. Similarly, {1, 1} and
{2, 1} are not equally likely in the second solution. In short, Solution 3 is the correct
solution. ♦
In addition, especially when the number of possible outcomes is infinite, the
procedure, observation, and model, mentioned in Example 2.2.4, of an experiment
should be clearly depicted in advance.
Example 2.3.10 On a circle C of radius r , we draw a chord AB√randomly. Obtain
the probability PB that the length l of the chord is no shorter than 3r . This problem
is called the Bertrand’s paradox.

11 Here, ‘fair’ means ‘1, 2, 3, 4, 5, and 6 are equally likely to occur’.


2.3 Probability 115

Fig. 2.5 Bertrand’s paradox. D


Solution 2

60◦ A
60◦

Fig. 2.6 Bertrand’s paradox. A


Solution 3

F G M H K
r r
2 2

Solution (Solution 1) Assume that the√ center point M of the chord is chosen
randomly. As shown in Fig. 2.4, l ≥ 3r is satisfied if the center point M is
located in or on the circle C1 of radius r2 with center the same as that of C. Thus,
 −1
PB = 41 πr 2 πr 2 = 41 .
(Solution 2) Assume that the point B is selected
√ randomly on the circle C with the
point A fixed. As shown in Fig. 2.5, l ≥ 3r is satisfied when the point B is on
the shorter arc D E, where D and E are the two points 23 πr apart from A along C
in two directions. Therefore, we have PB = 23 πr (2πr )−1 = 13 because the length of
the shorter arc D E is 23 πr .
(Solution 3) Assume that the chord AB √ is drawn orthogonal to a diameter F K of C.
As shown in Fig. 2.6, we then have l ≥ 3r if the center point M is located between
the two points H and G, located r2 apart from K toward F and from F toward K ,
respectively. Therefore, PB = 2rr = 21 .
This example illustrates that the experiment should be described clearly to obtain
the probability appropriately with the classical definition. ♦

2.3.2.2 Definition Via Relative Frequency

Probability can also be defined in terms of the relative frequency of desired outcomes
in a number of repetitions of a random experiment. Specifically, the relative frequency
of a desired event A can be defined as
nA
qn (A) = , (2.3.23)
n
116 2 Fundamentals of Probability

where n A is the number of occurrences of A and n is the number of trials. In many


cases, the relative frequency qn (A) converges to a value as n becomes larger, and the
probability P(A) of A can be defined as the limit

P(A) = lim qn (A) (2.3.24)


n→∞

of the relative frequency. One drawback of this definition is that the limit shown in
(2.3.24) can be obtained only by an approximation in practice.

Example 2.3.11 The probability that a person of a certain age will survive for a
year will differ from year to year, and thus it is difficult to use the classical definition
of probability. As an alternative in such a case, we assume that the tendency in the
future will be the same as that so far, and then compute the probability as the relative
frequency based on the records over a long period for the same age. This method is
often employed in determining an insurance premium. ♦

2.4 Conditional Probability

Conditional probability (Helstrom 1991; Shiryaev 1996; Sveshnikov 1968; Weirich


1983) is one of the most important notions and powerful tools in probability theory.
It is the probability of an event when the occurrence of another event is assumed.
Conditional probability is quite useful especially when only partial information is
available and, even when we can obtain some probability directly, we can often obtain
the same result more easily by using conditioning in many situations.

Definition 2.4.1 (conditional probability) The probability of an event under the


assumption that another event has occurred is called conditional probability. Specif-
ically, the conditional probability, denoted by P(A|B), of event A under the assump-
tion that event B has occurred is defined as

P(A ∩ B)
P(A|B) = (2.4.1)
P(B)

when P(B) > 0.

In other words, the conditional probability P(A|B) of event A under the assump-
tion that event B has occurred is the probability of A with the sample space Ω replaced
by the conditioning event B. Often, the event A conditioned on B is denoted by A|B.
From (2.4.1), we easily get

P(A)
P(A|B) = (2.4.2)
P(B)
2.4 Conditional Probability 117

when A ⊆ B and

P(A|B) = 1 (2.4.3)

when B ⊆ A.
Example 2.4.1 When the conditioning event B is the sample space Ω, we have
P(A|Ω) = P(A) because A ∩ B = A ∩ Ω = A and P(B) = P(Ω) = 1. ♦
Example 2.4.2 Consider the rolling of a fair die. Assume that we know the outcome
is an even number. Obtain the probability that the outcome is 2.
Solution Let A = {2} and B = {2, 4, 6}. Then, because A ∩ B = A, P(A ∩ B) =
 −1
P(A) = 16 , and P(B) = 21 = 0, we have P(A|B) = 16 21 = 13 . Again, P(A|B) is
the probability of A = {2} when B = {an even number} = {2, 4, 6} is assumed as
the sample space. ♦
Example 2.4.3 The probability for any child to be a girl is α and is not influenced
by other children. Assume that Dr. Kim has two children. Obtain the probabilities in
the following two separate cases: (1) the probability p1 that the younger child is a
daughter when Dr. Kim says “The elder one is a daughter”, and (2) the probability
p2 that the other child is a daughter when Dr. Kim says “One of my children is a
daughter”.
1 ∩D2 )
= αα = α, where D1 and D2 denote
2
Solution We have p1 = P ( D2 | D1 ) = P(D P(D1 )
the events of the first and second child being a daughter, respectively. Similarly,
because P (C A ) = P (D1 ∩ D2 ) + P (D1 ∩ B2 ) + P (B1 ∩ D2 ) = 2α − α 2 , we get
A ∩C B ) α2 α
p2 = P ( C B | C A ) = P(C
P(C A )
= 2α−α 2 = 2−α , where B1 and B2 denote the events

of the first and second child being a boy, respectively, and C A and C B denote the
events of one and the other child being a daughter, respectively. ♦

2.4.1 Total Probability and Bayes’ Theorems

Theorem 2.4.1 When P ( Bi | B1 B2 · · · Bi−1 ) > 0 for i = 2, 3, . . . , n, the probabil-


n 
n
ity of the intersection ∩ Bi = Bi of {Bi }i=1
n
can be written as
i=1 i=1

 
n
P ∩ Bi = P (B1 ) P ( B2 | B1 ) · · · P ( Bn | B1 B2 · · · Bn−1 ) , (2.4.4)
i=1

which is called the multiplication theorem. Similarly, the probability of the union
n
∪ Bi can be expressed as
i=1
118 2 Fundamentals of Probability
 
n    
P ∪ Bi = P (B1 ) + P B1c B2 + · · · + P B1c B2c · · · Bn−1
c
Bn , (2.4.5)
i=1

which is called the addition theorem.


Note that (2.4.5) is the same as (2.3.4). Next, (2.4.4) can be written as

P (B1 ∩ B2 ) = P (B1 ) P ( B2 | B1 )
= P (B2 ) P ( B1 | B2 ) (2.4.6)
   
when n = 2. Now, from 1 = P (Ω |B2 ) = P B1c ∪ B1 |B2 = P B1c |B2 +
P ( B1 | B2 ), we have P B1c |B2 = 1 − P ( B1 | B2 ). Using this result and (2.4.6),
(2.4.5) for n = 2can be written as P (B1 ∪ B2 ) = P (B1 ) + P B1c B2 = P (B1 ) +
P (B2 ) P B1c |B2 = P (B1 ) + P (B2 ) {1 − P ( B1 | B2 )}, i.e.,

P (B1 ∪ B2 ) = P (B1 ) + P (B2 ) − P (B2 ) P ( B1 | B2 )


= P (B1 ) + P (B2 ) − P (B1 ∩ B2 ) , (2.4.7)

which is the same as (2.3.5).


Example 2.4.4 Assume a box with three red and two green balls. We randomly
choose one ball and then another without replacement. Find the probability PRG that
the first ball is red and the second ball is green.
Solution (Method 1) The number of all possible ways of choosing two balls is 5 P2 ,
among which that of the desired outcome is 3 P1 × 2 P1 . Therefore, PRG = 3 P1 P
×2 P1
=
5 2
3
10
.
(Method 2) Let A = {the first ball is red} and B = {the second ball is green}. Then,
we have P(A) = 35 . In addition, P(B|A) is the conditional probability that the second
ball is green when the first ball is red, and is thus the probability of choosing a green
ball among the remaining two red and two green balls after a red ball has been chosen:
in other words, P(B|A) = 24 . Consequently, P(AB) = P(B|A)P(A) = 10 3
. ♦
Consider two events A and B. The event A can be expressed as

A = AB ∪ AB c , (2.4.8)

based on which we have P(A) = P(AB) + P ( AB c ) = P(A|B)P(B) + P (A|B c )


P (B c ), i.e.,
 
P(A) = P(A|B)P(B) + P A|B c {1 − P(B)} (2.4.9)

from (2.2.14) because AB and AB c are mutually exclusive. The result (2.4.9) shows
that the probability of A is the weighted sum of the conditional probabilities of A
when B and B c are assumed with the weights the probabilities of the conditioning
events B and B c , respectively.
2.4 Conditional Probability 119

The result (2.4.9) is quite useful when a direct calculation of the probability of an
event is not straightforward.
Example 2.4.5 In Box 1, we have two white and four black balls. Box 2 contains
one white and one black balls. We randomly take one ball from Box 1, put it into
Box 2, and then randomly take one ball from Box 2. Find the probability PW that
the ball taken from Box 2 is white.
Solution Let the events of a white ball from Box 1 and a white ball from Box 2
be C and D, respectively. Then, because P(C) = 13 , P(D|C) = 23 , P (D |C c ) =
1
3
, and P (C c ) = 1 − P(C) = 23 , we get PW = P(D) = P(D|C)P(C) + P (D |C c )
P (C c ) = 49 . ♦
Example 2.4.6 The two numbers of the upward faces are added after rolling a pair
of fair dice. Obtain the probability α that 5 appears before 7 when we continue the
rolling until the outcome is 5 or 7.
Solution Let An and Bn be the events that the outcome is neither 5 nor 7 and that the
outcome is 5, respectively, at the n-th rolling for n = 1, 2, . . .. Then, P (An ) = 1 −
P(5) − P(7) = 13 18
from P (Bn ) = P(5) = 36 4
= 19 and P(7) = 36 6
= 16 . Now, α =


P (B1 ∪ (A1 B2 ) ∪ (A1 A2 B3 ) · · · ) = P ( A1 A2 . . . An−1 Bn ) from (2.4.5) because
n=1
{A1 A2 . . . An−1 Bn }∞
n=1 are mutually exclusive, where we assume A0 B1 = B1 . Here,

P ( A1 A2 . . . An−1 Bn ) = P (Bn |A1 A2 · · · An−1 ) P (A1 A2 . . . An−1 )


 
1 13 n−1
= . (2.4.10)
9 18
∞  
 13 n−1
Therefore, α = 1
9 18
= 25 . ♦
n=1

Example 2.4.7 Example 2.4.6 can be viewed in a more intuitive way as follows:
Consider two mutually exclusive events A and B from a random experiment and
assume the experiments are repeated. Then, the probability that A occurs before B
can be obtained as

probability of A P(A)
= . (2.4.11)
probability of A or B P(A) + P(B)
1
Solving Example 2.4.6 based on (2.4.11), we get P(5 appears before 7) = 1
−1 9 9
+ 16 = 25 . ♦
Let us now generalize the number of conditioning events in (2.4.9). Assume a
n n
collection B j j=1 of mutually exclusive events and let A ⊆ ∪ B j . Then, P(A) =
  j=1
n  
P ∪ A ∩ B j can be expressed as
j=1
120 2 Fundamentals of Probability


n
 
P(A) = P AB j (2.4.12)
j=1

 
n n   n
because A = A ∩ = ∪ A ∩ B j and A ∩ B j j=1 are all mutually exclu-
∪ Bj
j=1
 j=1
     
sive. Now, recollecting that P AB j = P A  B j P B j , we get the following the-
orem called the total probability theorem:

Theorem 2.4.2 We have


n
    
P(A) = P A B j P B j (2.4.13)
j=1

n n
when B j j=1
is a collection of disjoint events and A ⊆ ∪ B j .
j=1

Example 2.4.8 Let A = {1, 2, 3} in the experiment of rolling a fair die. When
B1 = {1, 2} and B2 = {3, 4, 5, 6}, we have P(A) = 21 = 1 × 13 + 41 × 23 because
1)
 −1
P (B1 ) = 13 , P (B2 ) = 23 , P ( A |B1 ) = P(AB
P(B1 )
= 13 × 13 = 1, and P ( A |B2 ) =
P(AB2 )
 −1
P(B2 )
= 16 × 23 = 14 . Similarly, when B1 = {1} and B2 = {2, 3, 5}, we get
P(AB1 )
P(A) = 1
2
=1× 1
6
+ 2
3
× 21 from P (B1 ) = 16 , P (B2 ) = 21 , P (A |B1 ) = P(B1 )
=
P(AB2 )
1, and P ( A |B2 ) = P(B2 )
= 23 . ♦

Example 2.4.9 Assume a group comprising 60% women and 40% men. Among the
women, 45% play violin, and 25% of the men play violin. A person chosen randomly
from the group plays violin. Find the probability that the person is a man.

Solution Denote the events of a person being a man and a woman by M and W ,
respectively, and playing violin by V . Then, using (2.4.1) and (2.4.13), we get
V) P(V |M)P(M)
P(M|V ) = P(M P(V )
= P(V |M)P(M)+P(V |W )P(W )
= 10
37
because M c = W , P(W ) = 0.6,
P(M) = 0.4, P(V |W ) = 0.45, and P(V |M) = 0.25. ♦

Consider a collection {Bi }i=1


n
of events and an event A with P(A) > 0. Then,we
have

P ( A |Bk ) P (Bk )
P ( Bk | A) = (2.4.14)
P(A)

because P ( Bk | A) = P(B k A)
P(A)
and P (Bk A) = P ( A |Bk ) P (Bk ) from the definition of
conditional probability. Now, combining the results (2.4.13) and (2.4.14) when the
n
events {Bi }i=1
n
are all mutually exclusive and A ⊆ ∪ Bi , we get the following result
i=1
called the Bayes’ theorem:
2.4 Conditional Probability 121

Theorem 2.4.3 We have

P (A |Bk ) P (Bk )
P ( Bk | A) =     . (2.4.15)

n
P A B j P B j
j=1

n n
when B j j=1
is a collection of disjoint events, A ⊆ ∪ B j , and P(A) = 0.
j=1

It should be noted in applying Theorem 2.4.3 that, when A is not a subset of


n
∪ B j , using (2.4.15) to obtain P ( Bk | A) will yield an incorrect value. This is because
j=1

n      n
P(A) = P A  B j P B j when A is not a subset of ∪ B j . To obtain P ( Bk | A)
j=1 j=1
correctly in such a case, we must use (2.4.14), i.e., (2.4.15) with the denominator
n     
P A  B j P B j replaced back with P(A).
j=1

Example 2.4.10 In Example 2.4.5, obtain the probability that the ball drawn from
Box 1 is white when the ball drawn from Box 2 is white.
Solution Using the results of Example 2.4.5 and the Bayes’ theorem, we have
P(C|D) = P(C D)
P(D)
= P(D|C)P(C)
P(D)
= 21 . ♦
Example 2.4.11 For the random experiment of rolling a fair die, assume A =
{2, 4, 5, 6}, B1 = {1, 2}, and B2 = {3, 4, 5}. Obtain P ( B2 | A).

2     
Solution We easily get P(A) = 23 . On the other hand, P A B j P B j = 1
2
j=1
1)
 −1
from P (B1 ) = 13 ,P (B2 ) = 21 , P (A |B1 ) = P(AB
P(B1 )
= 16 × 13 = 21 , and
2)
 −1
P (A |B2 ) = P(AB
P(B2 )
= 13 × 21 = 23 . In other words, because A = {2, 4, 5, 6} is
2     
not a subset of B1 ∪ B2 = {1, 2, 3, 4, 5}, we have P(A) = P A B j P B j .
j=1
P(A|B2 )P(B2 )
Thus, we would get P ( B2 | A) = 
2 = 23 , an incorrect answer, if we use
P ( A | B j )P ( B j )
j=1
P(A|B2 )P(B2 )
(2.4.15) carelessly. The correct answer P ( B2 | A) = P(A)
= 1
2
can be obtained
by using (2.4.14) in this case. ♦
Let us consider an example for the application of the Bayes’ theorem.
Example 2.4.12 Assume four boxes with 2000, 500, 1000, and 1000 parts of a
machine, respectively. The probability of a part being defective is 0.05, 0.4, 0.1, and
0.1, respectively, for the four boxes.
(1) When a box is chosen at random and then a part is picked randomly from the
box, calculate the probability that the part is defective.
122 2 Fundamentals of Probability

(2) Assuming the part picked is defective, calculate the probability that the part is
from the second box.
(3) Assuming the part picked is defective, calculate the probability that the part is
from the third box.
Solution Let A and Bi be the events that the part picked is defective and the part is
from the i-th box, respectively. Then, P (Bi ) = 41 for i = 1, 2, 3, 4. In addition, the
value of P (A |B2 ), for instance, is 0.4 because P ( A |B2 ) denotes the probability
that a part picked is defective when it is from the second box.

4
(1) Noting that {Bi }i=1
4
are all disjoint, we get P(A) = P ( A |Bi ) P (Bi ) = 1
4
×
i=1
0.05 + 1
4
× 0.4 + 1
4
× 0.1 + 1
4
× 0.1, i.e.,

13
P(A) = (2.4.16)
80
from (2.4.13).
P(A|B2 )P(B2 )
(2) The probability to obtain is P ( B2 | A). We get P ( B2 | A) = 
4 =
P ( A | B j )P ( B j )
j=1
0.4× 14
= as shown in (2.4.15) because P(A) = from (2.4.16), P (B2 ) = 14 ,
13
8
13
13
80
80
and P ( A |B2 ) = 0.4.
0.1× 1 3 )P(B3 )
(3) Similarly, we get12 P ( B3 | A) = 13 4 = 13
2
from P ( B3 | A) = P(A|BP(A) ,
80
P (B3 ) = 41 , P ( A |B3 ) = 0.1, and P(A) = 13
80
.

If we calculate similarly the probabilities for the first and fourth boxes and then
add the four values in Example 2.4.1, we will get 1: in other words, we have
4
P ( Bi | A) = P (Ω|A) = 1.
i=1

2.4.2 Independent Events

Assume two boxes. Box 1 contains one red ball and two green balls, and Box 2
contains two red balls and four green balls. If we pick a ball randomly after choosing
a box with probability P(Box 1) = p = 1 − P(Box 2), then we have P(red ball) =
P ( red ball | Box 1) P(Box 1) + P ( red ball | Box 2) P(Box 2) = 13 p + 13 (1 − p) =
1
3
and P(green ball) = 23 . Note that

P(red ball) = P ( red ball | Box 1) = P ( red ball | Box 2) (2.4.17)

12 Here, 13
80 = 0.1625, 8
13 ≈ 0.6154, and 2
13 ≈ 0.1538.
2.4 Conditional Probability 123

and P(green ball) = P ( green ball | Box 1) = P ( green ball | Box 2): whichever box
we choose or whatever value the probability of choosing a box is, the probability that
the ball picked is red and green is 13 and 23 , respectively. In other words, the choice
of a box does not influence the probability of the color of the ball picked. On the
other hand, if Box 1 contains one red ball and two green balls and Box 2 contains
two red balls and one green ball, the choice of a box will influence the probability
of the color of the ball picked. Such an influence is commonly represented by the
notion of independence.
Definition 2.4.2 (independence of two events) If the probability P(AB) of the inter-
section of two events A and B is equal to the product P(A)P(B) of the probabilities
of the two events, i.e., if

P(AB) = P(A)P(B), (2.4.18)

then A and B are called independent (of each other) or mutually independent.
Example 2.4.13 Assume the sample space S = {1, 2, . . . , 9} and P(k) = 19 for
k = 1, 2, . . . , 9. Consider the events A = {1, 2, 3} and B = {3, 4, 5}. Then, P(A) =
1
3
, P(B) = 13 , and P(AB) = P(3) = 19 , and therefore P(AB) = P(A)P(B). Thus,
A and B are independent of each other. Likewise, for the sample space S =
{1, 2, . . . , 6}, the events C = {1, 2, 3} and D = {3, 4} are independent of each other
when P(k) = 16 for k = 1, 2, . . . , 6. ♦
When one of two events has probability 1 as the sample space S or 0 as the null
set ∅, the two events are independent of each other because (2.4.18) holds true.
Theorem 2.4.4 An event with probability 1 or 0 is independent of any other event.
Example 2.4.14 Assume the sample space S = {1, 2, . . . , 5} and let P(k) = 15 for
k = 1, 2, . . . , 5. Then, no two sets, excluding S and ∅, are independent of each other.
When P(1) = 10 1
, P(2) = P(3) = P(4) = 15 , and P(5) = 10 3
for the sample space
S = {1, 2, . . . , 5}, the events A = {3, 4} and B = {4, 5} are independent because
P(A)P(B) = P(4) from P(A) = 25 , P(B) = 21 , and P(4) = 15 . ♦
In general, two mutually exclusive events are not independent of each other: on
the other hand, we have the following theorem from Theorem 2.4.4:
Theorem 2.4.5 If at least one event has probability 0, then two mutually exclusive
events are independent of each other.

Example 2.4.15 For the sample space S = {1, 2, 3}, let the power set 2 S =
{∅, {1}, {2}, . . . , S} be the event space. Assume the probability measure P(1) = 0,
P(2) = 13 , and P(3) = 23 . Then, the events {2} and {3} are mutually exclusive, but not
independent of each other because P(2)P(3) = 29 = 0 = P(∅). On the other hand,
the events {1} and {2} are mutually exclusive and, at the same time, independent of
each other. ♦
124 2 Fundamentals of Probability

From P (Ac ) = 1 − P (A) and the definition (2.4.1) of conditional probability,


we can show the following theorem:

Theorem 2.4.6 If the events A and B are independent of each other, then A and B c
are also independent of each other, P(A|B) = P(A), and P(B|A) = P(B).

Example 2.4.16 Assume the sample space S = {1, 2, . . . , 6} and probability mea-
sure P(k) = 16 for k = 1, 2, . . . , 6. The events A = {1, 2, 3} and B = {3, 4} are
independent of each other as we have already observed in Example 2.4.13. Here,
B c = {1, 2, 5, 6} and thus P (B c ) = 23 and P (AB c ) = P(1, 2) = 13 . In other words,
A and B c are independent of each other because P ( AB c ) = P(A)P (B c ). We also
have P(A|B) = 21 = P(A) and P(B|A) = 13 = P(B). ♦

Definition 2.4.3 (independence of a number of events) The events {Ai }i=1


n
are called
independent of each other if they satisfy
  
P ∩ Ai = P (Ai ) (2.4.19)
i∈J
i∈J

for every finite subset J of {1, 2, . . . , n}.

Example 2.4.17 When A, B, and C are independent of each other with P(AB) = 13 ,
P(BC) = 16 , and P(AC) = 29 , obtain the probability of C.

Solution First, P(A) = 23 because P(B)P(C) = 27{P(A)}


2
2 =
1
6
from P(B) = 1
3P(A)
and P(C) = 9P(A) . Thus, P(C) = 3 from P(C) = 9P(A) .
2 1 2

A number of events {Ai }i=1 n


with n = 3, 4, . . . may or may not be independent of
each other even when Ai and A j are independent of each other for every possible
pair {i, j}. When only all pairs of two events are independent, the events {Ai }i=1n

with n = 3, 4, . . . are called pairwise independent.

Example 2.4.18 For the sample space Ω = {1, 2, 3, 4} of equally likely outcomes,
consider A1 = {1, 2}, A2 = {2, 3}, and A3 = {1, 3}. Then, A1 and A2 are independent
of each other, A2 and A3 are independent of each other, and A3 and A1 are independent
of each other because P ( A1 ) = P ( A2 ) = P ( A3 ) = 21 , P ( A1 A2 ) = P ({2}) = 14 ,
P (A2 A3 ) = P ({3}) = 41 , and P ( A3 A1 ) = P ({1}) = 14 . However, A1 , A2 , and A3
are not independent of each other because P ( A1 A2 A3 ) = P(∅) = 0 is not equal to
P (A1 ) P (A2 ) P (A3 ) = 18 . ♦

Example 2.4.19 A malfunction of a circuit element does not influence that of


another circuit element. Let the probability for a circuit element to function nor-
mally be p. Obtain the probability PS and PP that the circuit will function normally
when n circuit elements are connected in series and in parallel, respectively.
2.4 Conditional Probability 125

Solution When the circuit elements are connected in series, every circuit element
should function normally for the circuit to function normally. Thus, we have

PS = p n (2.4.20)

On the other hand, the circuit will function normally if at least one of the circuit
elements functions normally. Therefore, we get

PP = 1 − (1 − p)n (2.4.21)

because the complement of the event that at least one of the circuit elements functions
normally is the event that all elements are malfunctioning. Note that 1 − (1 − p)n >
p n for n = 1, 2, . . . when p ∈ (0, 1). ♦

2.5 Classes of Probability Spaces

In this section, the notions of probability mass functions and probability density func-
tions (Kim 2010), which are equivalent to the probability measure for the description
of a probability space, and are more convenient tools when managing mathematical
operations such as differentiation and integration, will be introduced.

2.5.1 Discrete Probability Spaces

In a discrete probability space, in which the sample space Ω is a countable set, we


normally assume Ω = {0, 1, . . .} or Ω = {1, 2, . . .} with the event space F = 2Ω .
Definition 2.5.1 (probability mass function) In a discrete probability space, a func-
tion p(ω) assigning a real number to each sample point ω ∈ Ω and satisfying

p(ω) ≥ 0, ω ∈ Ω (2.5.1)

and

p(ω) = 1 (2.5.2)
ω∈Ω

is called a probability mass function (pmf), a mass function, or a mass.


From (2.5.1) and (2.5.2), we have

p(ω) ≤ 1 (2.5.3)

for every ω ∈ Ω.
126 2 Fundamentals of Probability

Example 2.5.1 For the sample space Ω = J0 = {0, 1, . . .} and pmf


1
, x = 0; c, x = 1;
p(x) = 3 (2.5.4)
2c, x = 2; 0, otherwise,

determine the constant c.



Solution From (2.5.2), p(x) = 1
3
+ 3c = 1. Thus, c = 29 . ♦
x∈Ω

The probability measure P can be expressed as



P(F) = p(ω) (2.5.5)
ω∈F

for F ∈ F in terms of the pmf p. Conversely, the pmf p can be written as

p(ω) = P({ω}) (2.5.6)

in terms of the probability measure P.


Note that a pmf is defined for sample points and the probability measure for events.
Both the probability measure P and pmf p can be used to describe the randomness
of the outcomes of an experiment. Yet, the pmf is easier than the probability measure
to deal with, especially when mathematical operations such as sum and difference
are involved.
Some of the typical examples of pmf are provided in the examples below.

Example 2.5.2 For the sample space Ω = {x1 , x2 } and a number α ∈ (0, 1), the
function

1 − α, x = x1 ,
p(x) = (2.5.7)
α, x = x2

is called a two-point pmf. ♦

Definition 2.5.2 (Bernoulli trial) An experiment with two possible outcomes, i.e.,
an experiment for which the sample space has two elements, is called a Bernoulli
experiment or a Bernoulli trial.

Example 2.5.3 When x1 = 0 and x2 = 1 in the two-point pmf (2.5.7), we have



1 − α, x = 0,
p(x) = (2.5.8)
α, x = 1,

which is called the binary pmf or Bernoulli pmf. The binary distribution is usually
denoted by b(1, α), where 1 signifies the number of Bernoulli trial and α represents
the probability of the desired event or success. ♦
2.5 Classes of Probability Spaces 127

Example 2.5.4 In the experiment of rolling a fair die, assume the events A =
{1, 2, 3, 4} and Ac = {5,
 6}. Then, if we choose A as the desired event, the dis-
tribution of A is b 1, 23 . ♦

Example 2.5.5 When the sample space is Ω = Jn = {0, 1, . . . , n − 1}, the pmf

1
p(k) = , k ∈ Jn (2.5.9)
n
is called a uniform pmf. ♦

Example 2.5.6 For the sample space Ω = {1, 2, . . .} and a number α ∈ (0, 1), the
pmf

p(k) = (1 − α)k α, k ∈ Ω (2.5.10)

is called a geometric pmf. The distribution represented by the geometric pmf (2.5.10)
is called the geometric distribution with parameter α and denoted by Geom(α). ♦

When a Bernoulli trial with probability α of success is repeated until the first
success, the distribution of the number of failures is Geom(α). In some cases, the
function p(k) = (1 − α)k−1 α for k ∈ {1, 2, . . .} with α ∈ (0, 1) is called the geo-
metric pmf. In such a case, the distribution of the number of repetitions is Geom(α)
when a Bernoulli trial with probability α of success is repeated until the first success.

Example 2.5.7 Based on the binary pmf discussed in Example 2.5.3, let us intro-
duce the binomial pmf. Consider the sample space Ω = Jn+1 = {0, 1, . . . , n} and a
number α ∈ (0, 1). Then, the function

p(k) = n Ck α
k
(1 − α)n−k , k ∈ Jn+1 (2.5.11)

is called a binomial pmf and the distribution is denoted by b(n, α). ♦


 
In (2.5.11), the number n Cr = (n−rn!)!r ! , also denoted by nr , is the coefficient of
r n−r
x y in the expansion of (x + y)n , and thus called the binomial coefficient as we
have described in (1.4.60). Figure 2.7 shows the envelopes of binomial pmf for some
values of n when α = 0.4. The binomial pmf is discussed in more detail in Sect.
3.5.2.

Example 2.5.8 For the sample space Ω = J0 = {0, 1, . . .} and a number λ ∈ (0, ∞),
the function

λk
p(k) = e−λ , k ∈ J0 (2.5.12)
k!

is called a Poisson pmf and the distribution is denoted by P(λ). ♦


128 2 Fundamentals of Probability

0.3

0.25 α = 0.4

n = 10
0.2

0.15
p(k)

n = 50
0.1 n = 100
n = 150
0.05

0
0 10 20 30 40 50 60 70 80
k

Fig. 2.7 Envelopes of binomial pmf

0.35

0.3

∗ λ = 0.5
0.25
◦λ=3
probability

0.2

0.15

0.1

0.05

0
0 2 4 6 8 10 12
k

Fig. 2.8 Poisson pmf (for λ = 0.5, p(0) = √1


e
≈ 0.61)

λ
For the Poisson pmf (2.5.12), recollecting p(k+1)
p(k)
= k+1 , we have p(0) ≤ p(1) ≤
· · · ≤ p(λ − 1) = p(λ) ≥ p(λ + 1) ≥ p(λ + 2) ≥ · · · when λ is an integer, and
p(0) ≤ p(1) ≤ · · · ≤ p(λ − 1) ≤ p(λ) and p(λ) ≥ p(λ + 1) ≥ · · · when
λ is not an integer, where the floor function x is defined following (1.A.44) in
Appendix 1.2. Figure 2.8 shows two examples of Poisson pmf. The Poisson pmf will
be discussed in more detail in Sect. 3.5.3.
2.5 Classes of Probability Spaces 129

Example 2.5.9 For the sample space Ω = J0 = {0, 1, . . .}, r ∈ (0, ∞), and α ∈
(0, 1), the function

p(x) = −r Cx α
r
(α − 1)x , x ∈ J0 (2.5.13)

is called a negative binomial (NB) pmf, and the distribution with the pmf (2.5.13) is
denoted by NB(r, α). ♦
When r = 1, the NB pmf (2.5.13) is the geometric pmf discussed in Example
2.5.6. The NB pmf with r a natural number and a real number is called the Pascal
pmf and Polya pmf, respectively.
The meaning of NB(r, α) and the formula of the NB pmf vary depending on
whether the sample space is {0, 1, . . .} or {r, r + 1, . . .}, whether r represents a
success or a failure, or whether α is the probability of success or failure. In (2.5.13),
the parameters r and α represent the number and probability of success, respectively.
When a Bernoulli trial with the probability α of success is repeated until the r -th
success, the distribution of the number of repetitions is NB(r, α).
∞ 

−r
We clearly have p(x) = 1 because −r Cx (α − 1) = (1 + α − 1)
x
=
x=0 x=0
α −r from (1.A.12) with p = −r and z = α − 1. Now, the pmf (2.5.13) can be written
as p(x) = r +x−1 Cx αr (1 − α)x or, equivalently, as

p(x) = r +x−1 Cr −1 α
r
(1 − α)x , x ∈ J0 (2.5.14)

using13 −r Cx = 1
x!
(−r )(−r − 1) · · · (−r − x + 1), i.e.,

−r Cx = (−1)x r +x−1 Cx . (2.5.15)

Note that we have




r +x−1 Cx (1 − α)x = α −r (2.5.16)
x=0


∞ 

(r +x−1)! 

because r +x−1 Cx (1 − α)x = (r −1)!x!
(1 − α)x = r +x−1 Cr −1 (1 − α)x
x=0 x=0 x=0


and p(x) = 1. Letting x + r = y in (2.5.14), we get
x=0

p(y) = y−1 Cr −1 α
r
(1 − α) y−r , y = r, r + 1, . . . (2.5.17)

when r is a natural number, which is called the NB pmf sometimes. Here, note that
x+r −1 Cx |x=y−r = y−1 C y−r = y−1 Cr −1 .

13 Here, (−r )(−r − 1) · · · (−r − x + 1) = (−1)x r (r + 1) · · · (r + x − 1). Equation (2.5.15) can


also be obtained based on Table 1.4.
130 2 Fundamentals of Probability

2.5.2 Continuous Probability Spaces

Let us now consider the continuous probability space with the measurable space
(Ω, F) = (R, B (R)): in other words, the sample space Ω is the set R of real numbers
and the event space is the Borel field B (R).
Definition 2.5.3 (probability density function) In a measurable space (R, B (R)), a
real-valued function f , with the two properties

f (r ) ≥ 0, r ∈Ω (2.5.18)

and

f (r )dr = 1 (2.5.19)
Ω

is called a probability density function (pdf), a density function, or a density.


Example 2.5.10 Determine the constant c when the pdf is f (x) = 41 , c, and 0 for
x ∈ [0, 1), [1, 2), and [0, 2)c , respectively.
∞ 1 2
Solution From (2.5.19), we have −∞ f (r )dr = 0 41 dr + 1 c dr = 14 + c = 1.
Thus, c = 43 . ♦
The value f (x) of a pdf f does not represent the probability P({x}). Instead, the
set function P defined in terms of f as

P(F) = f (r )dr, F ∈ B (R) (2.5.20)
F

is the probability measure of the probability space on which f is defined. Note that
(2.5.20) is a counterpart of (2.5.5). While we have (2.5.6), an equation describing
the pmf in terms of the probability measure in the discrete probability space, we do
not have its counterpart in the continuous probability space, which would describe
the pdf in terms of the probability measure.
Do the integrals in (2.5.19) and (2.5.20) have any meaning? For interval events
or finite unions of interval events, we can adopt the Riemann integral as in most
engineering problems and calculations. On the other hand, the Riemann integral has
some caveats including that the order of the limit and integral for a sequence of
functions is not interchangeable. In addition, the Riemann integral is not defined in
some cases. For example, when

1, r ∈ [0, 1],
f (r ) = (2.5.21)
0, otherwise,

it is not possible to obtain the Riemann integral of f (r ) over the set F = {r :


r is an irrational number, r ∈ [0, 1]}. Fortunately, such a caveat can be overcome
2.5 Classes of Probability Spaces 131

by adopting the Lebesgue integral. Compared to the Riemann integral, the Lebesgue
integral has the following three important advantages:
(1) The Lebesgue integral is defined for any Borel set.
(2) The order of the limit and integral can almost always be interchanged in the
Lebesgue integral.
(3) When a function is Riemann integrable, it is also Lebesgue integrable, and the
results are known to be the same.

Like the pmf, the pdf is defined on the points in the sample space, not on the
events. On the other hand, unlike the pmf p(·) for which p(ω) directly represents
the probability P({ω}), the value f (x0 ) at a point x0 of the pdf f (x) is not the
probability at x = x0 . Instead, f (x0 ) d x represents the probability for the arbitrarily
small interval [x0 , x0 + d x). While the value of a pmf cannot be larger than 1 at
any point, the value of a pdf can be larger than 1 at some points. In addition, the
probability of a countable event is 0 even when the value of the pdf is not 0 in the
continuous space: for the pdf

2, x ∈ [0, 0.5],
f (x) = (2.5.22)
0, otherwise,

we have P({a}) = 0 for any point a ∈ [0, 0.5]. On the other hand, if we assume a
very small interval around a point, the probability of that interval can be expressed
as the product of the value of the pdf and the length of the interval. For example, for
a pdf f with f (3) = 4 the probability P([3, 3 + d x)) of an arbitrarily small interval
[3, 3 + d x) near 3 is

f (3)d x = 4d x. (2.5.23)

This implies that, as we can obtain the probability of an event by adding the proba-
bility mass over all points in the event in discrete probability spaces, we can obtain
the probability of an event by integrating the probability density over all points in
the event in continuous probability spaces.
Some of the widely-used pdf’s are shown in the examples below.

Example 2.5.11 When a < b, the pdf

1
f (r ) = u(r − a)u(b − r ) (2.5.24)
b−a

shown in Fig. 2.9 is called a uniform pdf or a rectangular pdf, and its distribution is
denoted by14 U (a, b). ♦

The probability measure of U [0, 1] is often called the Lebesgue measure.

14 Notations U [a, b], U [a, b), U (a, b], and U (a, b) are all used interchangeably.
132 2 Fundamentals of Probability

Fig. 2.9 The uniform pdf f (r)

1
b−a

a b r

Fig. 2.10 The exponential f (r)


pdf
2
λ=2
1

0 λ=1
r

Example 2.5.12 (Romano and Siegel 1986) Countable sets are all of Lebesgue
measure 0. Some uncountable sets such as the Cantor set C described in Example
1.1.46 are also of Lebesgue measure 0. ♦
Example 2.5.13 The pdf

f (r ) = λe−λr u(r ) (2.5.25)

shown in Fig. 2.10 is called an exponential pdf with λ > 0 called the rate of the
pdf. The exponential pdf with λ = 1 is called the standard exponential pdf. The
exponential pdf will be discussed again in Sect. 3.5.4. ♦

Example 2.5.14 The pdf

λ −λ|r |
f (r ) = e (2.5.26)
2
with λ > 0, shown in Fig. 2.11, is called a Laplace pdf or a double exponential pdf,
and its distribution is denoted by L(λ). ♦
Example 2.5.15 The pdf
 
1 (r − m)2
f (r ) = √ exp − (2.5.27)
2π σ 2 2σ 2

shown in Fig. 2.12 iscalled a Gaussian pdf or a normal pdf, and its distribution is
denoted by N m, σ 2 . ♦
2.5 Classes of Probability Spaces 133

Fig. 2.11 The Laplace pdf f (r)

1
λ=2
0.5
λ=1
0 r

Fig. 2.12 The normal pdf f (r)


σ1 < σ2

σ = σ1

σ = σ2

m r

When m = 0 and σ 2 = 1, the normal pdf is called the standard normal pdf. The
normal distribution is sometimes called the Gauss-Laplace distribution, de Moivre-
Laplace distribution, or the second Laplace distribution (Lukacs 1970). The normal
pdf will be addressed again in Sect. 3.5.1 and its generalizations into multidimen-
sional spaces in Chap. 5.

Example 2.5.16 For a positive number α and a real number β, the function

α 1
f (r ) = (2.5.28)
π (r − β)2 + α 2

shown in Fig. 2.13 is called a Cauchy pdf and the distribution is denoted by C(β, α).

The Cauchy pdf is also called the Lorentz pdf or Breit-Wigner pdf. We will mostly
consider the case β = 0, with the notation C(α) in this book.

Example 2.5.17 The pdf


 
r r2
f (r ) = 2 exp − 2 u(r ) (2.5.29)
α 2α

shown in Fig. 2.14 is called a Rayleigh pdf. ♦


 2
Example 2.5.18 When f (v) = av exp −v u(v) is a pdf, obtain the value of a.
∞ ∞  
Solution From −∞ f (v)dv = a 0 v exp −v 2 dv = a2 = 1, we get a = 2. ♦
134 2 Fundamentals of Probability

Fig. 2.13 The Cauchy pdf f (r)


1
π
α=1
1

α=2

0 r

Fig. 2.14 The Rayleigh pdf f (r)


α1 < α2
α = α1

α = α2

0 r

Fig. 2.15 The logistic pdf f (r)


k1 > k2

k = k1

k = k2

0 r

Example 2.5.19 The pdf (Balakrishnan 1992)

ke−kr
f (r ) =  2 (2.5.30)
1 + e−kr

shown in Fig. 2.15 is called a logistic pdf, where k > 0. ♦

Example 2.5.20 The pdf


 
1 α−1 r
f (r ) = α r exp − u(r ) (2.5.31)
β Γ (α) β

shown in Fig. 2.16 is called a gamma pdf and the distribution is denoted by G(α, β),
where α > 0 and β > 0. It is clear from (2.5.25) and (2.5.31) that the gamma pdf
with α = 1 is the same as an exponential pdf. ♦
2.5 Classes of Probability Spaces 135

Fig. 2.16 The gamma pdf f (r)


β=1
α = 0.5

α=1
α=2

0 r

Example 2.5.21 The pdf

r α−1 (1 − r )β−1
f (r ) = u(r )u(1 − r ) (2.5.32)
B̃(α, β)

shown in Fig. 2.17 is called a beta pdf and the distribution is denoted by B(α, β),
where α > 0 and β > 0. ♦

In (2.5.32), B̃(α, β) is the beta function described in (1.4.95). Unless a confusion


arises regarding the beta function B̃(α, β) and the beta distribution B(α, β), we often
use B(α, β) for both the beta function and beta distribution.
Table 2.1 shows some general properties of the beta pdf f (r ) shown in (2.5.32).
When α = 1 and β > 1, the pdf f (r ) is decreasing in (0, 1), f (0) = β, and f (1) = 0.
When α > 1 and β = 1, the pdf f (r ) is increasing in (0, 1), f (0) = 0, and f (1) = α.
In addition, because

f (r ) = r α−2 (1 − r )β−2 {α − 1 − (α + β − 2)r }u(r )u(1 − r ), (2.5.33)


   
α−1 α−1
the pdf f (r ) increases and decreases in 0, α+β−2 and α+β−2 , 1 , respectively, and
f (0) = f (1) = 0 when α > 1 and β > 1. In other words, when α > 1 and β > 1,
α−1
the pdf f (r ) is a unimodal function, and has its maximum at r = α+β−2 between 21
α−1
and 1 if α > β; at r = 2 if α = β; and at r = α+β−2 between 0 and 2 if α < β. The
1 1

maximum point is closer to 0 when α is closer to 1 or when β is larger, and it is closer


to 1 when β is closer to 1 or when α is larger. Such a property of the beta pdf can be
used in the order statistics (David and Nagaraja 2003) of discrete distributions.
 
Example 2.5.22 The pdf of the distribution B 21 , 21 is

1
f (r ) = √ u(r )u(1 − r ). (2.5.34)
π r (1 − r )
1 0 −2 cos v sin v
Letting r = cos2 v, we have 0 f (r )dr = π π cos v sin v
dv = 1. The pdf (2.5.34) is
2
also called the inverse sine pdf. ♦
136 2 Fundamentals of Probability

f (r)
(α, β) = (0.7, 0.3)
(α, β) = (1, 3)
(α, β) = (2, 5)

)
,2
(7
)=
(α, 1)
β) (3,


= (2
)=


, 3) ,β

0 1 r

Fig. 2.17 The beta pdf

Table 2.1 Characteristics of a beta pdf f (r ), 0 < r < 1


0<α<1 α=1 α>1
Dereasing and Increasing function
0<β<1 Increasing function
then increasing f (0) = β.
Decreasing function Increasing function
β=1 f (r ) = 1
f (1) = α f (0) = 0, f (1) = α
Increasing and
Decreasing function
β>1 Decreasing function then decreasing
f (0) = β, f (1) = 0
f (0) = 0, f (1) = 0

2.5.3 Mixed Spaces



Let {Pi }i=1 be probability measures on a common measurable space (Ω, F) and
∞ 

{ai }i=1 be non-negative numbers such that ai = 1. Then, the set function
i=1



P(A) = ai Pi (A) (2.5.35)
i=1


is also a probability measure on (Ω, F). When some of {Pi }i=1 are discrete while
others are continuous, the probability measure (2.5.35) is called a mixed probability
measure. An important example of the mixed probability measure is the sum
 
P(A) = λ p(x) + (1 − λ) f (x)d x (2.5.36)
x∈Ad x∈Ac
2.5 Classes of Probability Spaces 137

of a continuous probability measure and a discrete probability measure, where 0 <


λ < 1, Ad is a discrete event, Ac is a continuous event, A = Ad ∪ Ac , f is a pdf, and
p is a pmf.
Example 2.5.23 (Thomas 1986) Consider a distribution obtained by combining
a point mass of 41 at r = 21 , a point mass of 21 at r = 43 , and a unform density for
r ∈ [0, 1]. The distribution can then be described by the pdf
   
1 1 1 3 1
f (r ) = δ r− + δ r− + , r ∈ [0, 1]. (2.5.37)
4 2 2 4 4

This example implies that a pdf can be defined also for discrete and mixed spaces by
using impulse functions. ♦
Example 2.5.24 The probability space with the pmf
1
, r = 0; 1
,r = 1;
p(r ) = 2 3 (2.5.38)
1
6
,r = 2; 0, otherwise

can also be represented by the pdf

1 1 1
f (r ) = δ(r ) + δ(r − 1) + δ(r − 2). (2.5.39)
2 3 6
 0+
Here, for example, p(0) = 0− f (x)d x = 21 . ♦
Note that what Example 2.5.24 implies is not that the pmf (2.5.38) is the same as
the pdf (2.5.39), but that a discrete probability space can be expressed in terms of
both a pmf and a pdf. If we have
 a+
f (x)d x = p(a) (2.5.40)
a−

for an integer a, then the pdf f (r ) expressed in terms of impulse functions and the
pmf p(r ) represent the same probability space. In other words, to check whether a
pmf p(r ) and a pdf f (r ) represent the same probability space or not, we are required
to check whether the pmf p(r ) and the pdf f (r ) satisfy (2.5.40).

Appendices

Appendix 2.1 Continuity of Probability

Theorem 2.A.1 For a monotonic sequence {Bn }∞ n=1 of events, the probability of the
limit event is equal to the limit of the probabilities of the events in the sequence. In
other words,
138 2 Fundamentals of Probability
 
P lim Bn = lim P (Bn ) (2.A.1)
n→∞ n→∞

holds true.

n
Proof First, when {Bn }∞
n=1 is a non-decreasing sequence, recollect that ∪ Bi =
i=1


Bn and ∪ Bi = lim Bn . Consider a sequence {Fi }i=1 such that F1 = B1 and
i=1 n→∞
n−1
Fn = Bn − ∪ Bi = Bn ∩ Bn−1
c
for n = 2, 3, . . .. Then, {Fn }∞
n=1 are all mutually
i=1
n n ∞ ∞
exclusive, ∪ Fi = ∪ Bi for any natural number n, and ∪ Fi = ∪ Bi = lim Bn .
i=1

i=1
     i=1 i=1 n→∞
∞ ∞ ∞ n
Therefore, P lim Bn = P ∪ Bi = P ∪ Fi = P (Fi ) = lim P (Fi )
n→∞ i=1 i=1 n→∞ i=1
    i=1
n n
= lim P ∪ Fi = lim P ∪ Bi , i.e.,
n→∞ i=1 n→∞ i=1

 
P lim Bn = lim P (Bn ) (2.A.2)
n→∞ n→∞

n
recollecting (2.2.15), Axiom 4 of probability, and ∪ Bi = Bn .
i=1

Next, when {Bn }∞ c
n=1 is a non-increasing sequence, Bn n=1
is a non-decreasing
sequence, and thus, we have
   
P lim Bnc = lim P Bnc (2.A.3)
n→∞ n→∞

∞ ∞
from (2.A.2). Noting that lim Bnc = ∪ Bic because Bnc n=1
is a non-decreasing
n→∞ i=1

sequence and that ∩ Bi = lim Bn because {Bn }∞ n=1 is a non-increasing sequence,
i=1 
n→∞
c  c
∞ ∞
we have lim Bnc = ∪ Bic = ∩ Bi = lim Bn . Thus the left-hand side of
n→∞ n→∞
i=1
 i=1   
(2.A.3) can be written as P lim Bn = 1 − P lim Bn . Meanwhile, the right-
c
n→∞ n→∞ 
hand side of (2.A.3) can easily be written as lim P Bnc = lim {1 − P (Bn )} =
n→∞ n→∞
1 − lim P (Bn ). Then, (2.A.3) yields (2.A.1). ♠
n→∞

The results of Theorem 2.A.1 that


 

lim P (Bn ) = P ∪ Bi (2.A.4)
n→∞ i=1

for a non-decreasing sequence {Bn }∞


n=1 and that
Appendices 139
 

lim P (Bn ) = P ∩ Bi (2.A.5)
n→∞ i=1

for a non-increasing sequence {Bn }∞n=1 are called the continuity from below and
above of probability, respectively.
Theorem 2.A.1 deals with monotonic, i.e., non-decreasing and non-increasing,
sequences. The same result holds true more generally as we can see in the following
theorem:

Theorem 2.A.2 When the limit event lim Bn of a sequence {Bn }∞


n=1 exists, the
n→∞
probability of the limit event is equal to the limit of the probabilities of the events in
the sequence. In other words,
 
P lim Bn = lim P (Bn ) (2.A.6)
n→∞ n→∞

holds true.

Proof First, recollect that, among the limit values of a sequence {an }∞ n=1 of real
numbers, the largest and smallest ones are denoted by an and an , respectively. When
an = an , this value is called the limit of the sequence and denoted by lim an . Now,
 ∞  n→∞ 

noting that ∪ Bk is a non-increasing sequence, we have P lim sup Bn =
k=n
  n=1 n→∞
∞ ∞
P ∩ ∪ Bk , i.e.,
n=1 k=n

   

P lim sup Bn = lim P ∪ Bk (2.A.7)
n→∞ n→∞ k=n

from (1.5.9), (1.5.17), and (2.A.1). In the meantime, we have


 

P (Bn ) ≤ lim P ∪ Bk (2.A.8)
n→∞ k=n

 
∞ ∞
because P (Bn ) ≤ P ∪ Bk from Bn ⊆ ∪ Bk . From (2.A.7) and (2.A.8), we get
k=n k=n

 
P (Bn ) ≤ P lim sup Bn . (2.A.9)
n→∞

Similarly, we get
140 2 Fundamentals of Probability

   
∞ ∞
P lim inf Bn = P ∪ ∩ Bk
n→∞ n=1 k=n
 

= lim P ∩ Bk
n→∞ k=n

≤ P (Bn ) (2.A.10)
 ∞

for the non-decreasing sequence ∩ Bk . The last line
k=n n=1
 

lim P ∩ Bk ≤ P (Bn ) (2.A.11)
n→∞ k=n

 
∞ ∞
of (2.A.10) is due to P ∩ Bk ≤ P (Bn ) from ∩ Bk ⊆ Bn . Now, (2.A.9) and
k=n k=n
(2.A.10) produces
 
P (Bn ) ≤ P lim sup Bn
n→∞
 
= P lim Bn
n→∞
 
= P lim inf Bn
n→∞
≤ P (Bn ) (2.A.12)

if lim Bn exists. We get the desired result


n→∞

 
P lim Bn = P (Bn )
n→∞
= P (Bn )
= lim P (Bn ) (2.A.13)
n→∞

by combining (2.A.12) and P (Bn ) ≥ P (Bn ). ♠


Definition 2.A.1 (continuity of probability) For the limit lim Bn of a sequence
  n→∞

{Bn }∞
n=1 of events, {P (B )}
n n=1 converges to P lim Bn as shown in (2.A.1) and
n→∞
(2.A.6). The relation
 
P lim Bn = lim P (Bn ) (2.A.14)
n→∞ n→∞

is called the continuity of probability.


In other words, the probability of the limit of a sequence of events is equal to the
limit of the sequence of the probabilities of the events.
Appendices 141

Appendix 2.2 Borel-Cantelli Lemma

Let us discuss the Borel-Cantelli lemma, which deals with the probability of upper
bound events.
Theorem 2.A.3 (Rohatgi and Saleh 2001) When the sum of the probabilities
∞
{P (Bn )}∞ ∞
n=1 of a sequence {Bn }n=1 of events is finite, i.e., when P (Bn ) < ∞,
  n=1
the probability P Bn of the upper bound of {Bn }∞ n=1 is 0.

n−1 

∞  
∞ 

Proof First, from P (Bk ) = lim P (Bk ) + P (Bk ) = P (Bk ) +
k=1 n→∞ k=1 k=n k=1


lim P (Bk ), we get
n→∞ k=n



lim P (Bk ) = 0. (2.A.15)
n→∞
k=n

 
Now using (2.A.7) and the Boole inequality (2.3.2), we get P lim sup Bn =
  n→∞
∞ ∞
lim P ∪ Bk ≤ lim P (Bk ), i.e.,
n→∞ k=n n→∞ k=n

 
P lim sup Bn = 0 (2.A.16)
n→∞

from (2.A.15). ♠
Theorem 2.A.4 When {Bn }∞
n=1 is a sequence of independent events and the sum

∞ 
∞  
P (Bn ) is infinite, i.e., P (Bn ) → ∞, the probability P Bn of the upper bound
n=1 n=1
of {Bn }∞
n=1 is 1.

   

Proof First, note that P lim sup Bn = lim P ∪ Bi , i.e.,
n→∞ n→∞ i=n

    

P lim sup Bn = lim 1 − P ∩ Bi
c
(2.A.17)
n→∞ n→∞ i=n


∞ 

as in the proof of Theorem 2.A.3. Next, if P (Bk ) → ∞, then P (Bk ) →
k=1 k=n

n−1 
∞ 
n−1
∞ because P (Bk ) ≤ n − 1 for any number n and P (Bk ) = P (Bk ) +
k=1 k=1 k=1
142 2 Fundamentals of Probability
 

∞ ∞ 
∞    ∞
P (Bk ). Therefore, we get P ∩ Bic = P Bic = {1 − P (Bi )} recollect-
k=n i=n i=n i=n
∞ ∞
ing that {Bi }i=1 are independent of each other, and thus, Bic i=1 are independent of
each other. Finally, noting that 1 − x ≤ e−x for x ≥ 0, we get
  ∞


P ∩ Bic ≤ exp {−P (Bi )}
i=n
i=n
 ∞


= exp − P (Bi )
i=n
= 0, (2.A.18)

which proves the theorem when used in (2.A.17). ♠


 
When {Bn }∞ n=1 is a sequence of independent events, the probability P Bn of the
upper bound event of {Bn }∞ n=1 is either 0 or 1 from the Borel-Cantelli lemma. Borel-
Cantelli lemmas will be employed when we discuss the strong law of large numbers
in Sect. 6.2.2.2.

Example 2.A.1 Assume P (X n = 0) = n12 = 1 − P (X n = 1) for a sequence


{X n }∞ independentevents, and let Bn = {X n = 0}. Then, from Theorem 2.A.3,
n=1 of 


P (Bn ) = π6 < ∞. There-
2
we have P lim sup Bn = P (i.o. Bn ) = 0 because
n→∞ n=1
fore, when n is sufficiently large, the probabilities that X n will be 0 and 1 are 0 and
1, respectively. In other words, lim X n = 1 almost surely. ♦
n→∞

Example 2.A.2 Assume P (X n = 0) = n1 = 1 − P (X n = 1) for a sequence


{X n }∞
n=1 of independent events, and let Bn = {X n = 0}. Then, from Theorem 2.A.4,


we have P (i.o. Bn ) = 1 or, equivalently, almost surely X n = 0 because P (Bn ) =
n=1

∞  
∞. On the other hand, Bnc also occurs almost surely because P Bn = ∞. In other
c
n=1
words, almost surely X n is 0 infinitely many times, and at the same time, 1 infinitely
many times. Consequently, the probability that lim X n does not exist, i.e., that X n
n→∞
does not converge, is 1. ♦

Appendix 2.3 Measures and Lebesgue Integrals

The notion of length, area, volume, and weight that we encounter in our daily lives
are examples of measure. The length of a rod, the area of a house, the volume of a ball,
and the weight of a package assign numbers to objects. They also assign numbers to
groups of objects.
Appendices 143

A measure is a set function assigning a number to a set. Nonetheless, not all set
functions are measures. A measure should satisfy some conditions. For example, if
we consider the measure of weight, the weight of a bottle filled with water is the sum
of the weight of the bottle and that of the water. In other words, the measure of the
union of sets is equal to the sum of the measures of the sets for mutually exclusive
sets.
Definition 2.A.2 (measure) A non-negative additive function μ with the domain a
σ -algebra is called a measure.
Here, an additive function is a function such that the value of the function for a
countable union of sets is the same as the sum of the values of the function for the
sets when the sets are mutually exclusive. In other words, a function μ satisfying
  ∞


μ ∪ Ai = μ ( Ai ) (2.A.19)
i=1
i=1


for countable mutually exclusive sets {Ai }i=1 in a σ -algebra is called an additive
function.
Example 2.A.3 Consider a finite set Ω and the collection F = 2Ω . Then, the num-
ber μ(A) of elements of A ∈ F is a measure. ♦
Theorem 2.A.5 For a measure μ on a σ -algebra F, let {An ∈ F}∞
n=1 and A1 ⊆

A2 ⊆ · · · . Then, A = ∪ An ∈ F and 15 lim μ ( An ) = μ(A).
n=1 n→∞


Proof First, because F is a σ -algebra, A = ∪ An is an element of F. Next,
n=1
let B1 = A1 and Bn = An − An−1 for n = 2, 3, . . .. Then, {Bn }∞n=1 are
 mutually

n ∞ n
exclusive, An = ∪ Bi , and A = ∪ Bn . Thus, lim μ ( An ) = lim μ ∪ Bi =
n→∞ n→∞
i=1 n=1
  i=1
n ∞ ∞ 

lim μ (Bi ) = μ (Bi ) and μ(A) = μ ∪ Bi = μ (Bi ) from (2.A.19) and,
n→∞ i=1 i=1 i=1 i=1
consequently, lim μ ( An ) = μ(A). ♠
n→∞

The measure for a subset A of Ω can be defined as μ(A) = μω in an abstract
ω∈A
space Ω by first choosing arbitrarily a non-negative number μω for ω ∈ Ω when Ω
is a countable set.
For an abstract space Ω = {3, 4, 5}, let μω = 5 − ω for ω ∈ Ω.
Example 2.A.4 
Then, μ(A) = μω is a measure. We have μ({3}) = 2, μ({4}) = 1, μ({5}) =
ω∈A

∞ ∞
15 Note that lim An = ∪ An and lim An = ∩ An when A1 ⊆ A2 ⊆ · · · and A1 ⊇ A2 ⊇ · · · ,
n→∞ n=1 n→∞ n=1
respectively, as discussed in (1.5.8) and (1.5.9).
144 2 Fundamentals of Probability

0, μ({3, 4}) = μ({3}) + μ({4}) = 3, μ({3, 5}) = μ({3}) + μ({5}) = 2, μ({4, 5}) =
μ({4}) + μ({5}) = 1, and μ({3, 4, 5}) = μ({3}) + μ({4}) + μ({5}) = μ({3, 4}) +
μ({5}) = μ({4, 5}) + μ({3}) = μ({3, 5}) + μ({4}) = 3. ♦

To consider measure in uncountable sets, we introduce the notion of elementary


sets, based on which the measure is defined and then extended for general sets.

 In a Euclidean space R of p dimension, a set in the


p
Definition 2.A.3
 (rectangle)
form x = x1 , x2 , . . . , x p : ai ≤ xi ≤ bi , i = 1, 2, . . . , p with −∞ < ai ≤ bi ≤
∞ is called a rectangle, an interval, or a box.

In Definition 2.A.3, ai ≤ xi ≤ bi can be replaced with ai < xi < bi or ai < xi ≤


bi . Note that a null set is also regarded as an interval.

Definition 2.A.4 (elementary set) A set is called an elementary set if it can be


expressed as the union of a finite number of intervals.

Example 2.A.5 Examples of an elementary set and a non-elementary set are shown
in Fig. 2.18. ♦

Definition 2.A.5 (outer measure; covering) Let μ be an additive, non-negative, and



finite set function defined on the collection of all elementary sets. A collection { Ai }i=1

of elementary sets such that E ⊆ ∪ Ai for E ⊆ R p is called a covering of E, and
i=1
the lower bound


μ∗ (E) = inf μ ( Ai ) (2.A.20)
i=1



of μ ( Ai ) over all the coverings of E is called the outer measure of E.
i=1

In general, we have



μ (E) ≤ μ∗ (Bi ) (2.A.21)
i=1


when E = ∪ Bi , and
i=1

μ∗ (E) = μ(E) (2.A.22)

when E is an elementary set.

Example 2.A.6 Assume the sets shown in Fig. 2.18. Let the measure of the two-
dimensional interval
Appendices 145

x2 x2
(2, 2) (2, 2)

1
(1, 1)

2 x1 2 x1
(1) (2)

Fig. 2.18 Examples of an elementary set (1) and a non-elementary set (2) in two-dimensional space

Aa,b,c,d = {(x1 , x2 ) : a ≤ x1 ≤ b, c ≤ x2 ≤ d} (2.A.23)

be μ(Aa,b,c,d ) = (b − a)(d − c). Let the set in Fig. 2.18 (1) be B1 . Then, we
have B1 ⊆ A0,2,0,2 , B1 ⊆ A0,2,0,1 ∪ A1,2,0,2 , and B1 ⊆ A0,2,0,1 ∪ A1,2,1,2 , among
which the covering with the smallest measure is A0,2,0,1 , A1,2,1,2 . Thus, the outer
measure of B1 is μ∗ (B1 ) = 2 + 1 = 3. Similarly, let the set in Fig. 2.18 (2) be
B2 . Then, we have B2 ⊆ A0,2,0,2 , B2 ⊆ A0,2,0,1 ∪ A0,2,1,2 , B2 ⊆
 A0,2,0,1 ∪ A1,2,1,2
 , n
. . ., among which the covering with the smallest measure is
A 2(i−1) ,2, 2(i−1) , 2i
n n n i=1
n  

as n → ∞. Thus, the outer measure of B2 is μ (B2 ) = 4 lim 1− n n = i−1 1
n→∞ i=1
1
4 0 (1 − x)d x = 2. ♦

Definition 2.A.6 (finitely μ-measurable set; μ-measurable set) For a sequence


{An }∞
n=1 of elementary sets, a set A such that

lim μ∗ (An A) = 0 (2.A.24)


n→∞

is called finitely μ-measurable. A set is called μ-measurable if it is obtained from a


countable union of finitely μ-measurable sets.

The collections of all finitely μ-measurable sets and μ-measurable sets are denoted
by M F (μ) and M(μ), respectively.

Theorem 2.A.6 The collection M(μ) is a σ -algebra, and the outer measure μ∗ is
an additive set function on M(μ).

Proof Instead of a rigorous proof, we will simply discuss a brief outline. Assume two
∞ ∞
sequences {Ai }i=1 and {Bi }i=1 of elementary sets converging to A and B, respec-
tively, when A and B are elements of M F (μ). If we let d(A, B) = μ∗ (AB),
then we can show that M F (μ) is an  algebra by showing  that A ∪ B and  A∩B 
are included in M F (μ)
 based on d A i ∪
 A j , Bi ∪ B j ≤ d (A i , Bi ) + d Aj, Bj ,
d Ai ∩ A j , Bi ∩ B j ≤ d (Ai , Bi ) + d A j , B j , and |μ∗ (A) − μ∗ (B)| ≤ d(A, B).
146 2 Fundamentals of Probability

Moreover, we can show that μ∗ is finitely additive on M F (μ) based on μ∗ (A) +


μ∗ (B) = μ∗ (A ∪ B) + μ∗ (A ∩ B) and μ∗ (A ∩ B) = 0 when A ∩ B = ∅.

Now, when A ∈ M(μ), we can express A = ∪ An for An ∈ M F (μ), and we
n=1
∞ n n−1
then have A = ∪ An by letting A1 = A1 and An = ∪ Ai − ∪ Ai for n = 2, 3, . . ..
n=1 i=1 i=1
Based on this, we can show that μ∗ is additive on M(μ) by showing that μ∗ (A) =


μ∗ (Ai ). Finally, we can show that M(μ) is a σ -algebra based on the fact that
i=1
any countable set operations on the sets in M(μ) can be obtained from a countable
union of M F (μ). ♠
Based on Theorem 2.A.6, we can use μ∗ , instead of μ, as the measure when we
deal with μ-measurable sets. In essence, we have first defined μ for elementary sets
and then extended μ into μ∗ , an additive set function on the σ -algebra M(μ).
Definition 2.A.7 (Lebesgue measure) The Lebesgue measure in the Euclidean space
R p is defined as


n
μ(A) = m (Ii ) (2.A.25)
i=1

n
for A = ∪ Ii , where {Ii }i=1
n
are non-overlapping intervals and
i=1


p
m(I ) = (bk − ak ) (2.A.26)
k=1

 
with I = x = x1 , x2 , . . . , x p : ak ≤ xk ≤ bk , k = 1, 2, . . . , p an interval in R p .

Definition 2.A.7 is based on the fact that any elementary set can be obtained from
a union of non-overlapping intervals {Ii }i=1n
. An open set can be obtained from a
countable union of open intervals and is a μ-measurable set. Similarly, a closed set
is the complement of an open set and is also a μ-measurable set because M(μ) is
a σ -algebra. As discussed in Definition 2.2.7, the collection of all Borel sets is a σ -
algebra and is called the Borel σ -algebra or Borel field. In addition, a μ-measurable
set can always be expressed as the union of a Borel set and a set which is of measure 0
and is mutually exclusive of the Borel set. Under the Lebesgue measure, all countable
sets and some16 uncountable sets are of measure 0.
Example 2.A.7 In the one-dimensional space, the Lebesgue measure of an interval
[a, b] is the length μ ([a, b]) = b − a of the interval. The Lebesgue measure of the
set Q of rational numbers is μ(Q) = 0. ♦

16 The Cantor set discussed in Example 1.1.46 is one such example.


Appendices 147

Definition 2.A.8 (measure space; measurable space) In a metric space X , if there


exist a σ -algebra M of measurable sets composed of subsets of X and a non-negative
additive set function μ, then X is called a measure space. Here, if X ∈ M, then
(X, M, μ) is called a measurable space.

Example 2.A.8 In the space X = R p , we have the Lebesgue measure and the col-
lection M of all sets measurable by the Lebesgue measure. Then, it is easy to see
that X is a measure space. ♦

Example 2.A.9 In the space X = J+ , let the number of elements in a set be the
measure μ of the set and let the collection of all subsets of X be M. Then, (X, M, μ)
is a measurable space. ♦

Definition 2.A.9 (measurable function) When the set {x : f (x) > a} is always a
measurable set, a real function f defined on a measurable space is called a measurable
function.

Example 2.A.10 Continuous functions in R p are all measurable functions. ♦

Example 2.A.11 If f is a measurable function, so is | f |. If f and g are both mea-


surable functions, then max( f, g) and min( f, g) are measurable functions. ♦

Example 2.A.12 If { f n }i=1 is a sequence of measurable functions, then sup f n (x)
and lim sup f n (x) are measurable functions. ♦
n→∞

Definition 2.A.10 (simple function) When the range of a function on a measurable


space is finite, the function is called a simple function.

Example 2.A.13 When the range of a simple function f is {c1 , c2 , . . . , cn }, we have



n
f (x) = ci K Bi (x), where Bi = {x : f (x) = ci } and
i=1


1, x ∈ E,
K E (x) = (2.A.27)
0, x ∈
/E

is the indicator function of E. ♦

Theorem 2.A.7 There exists a sequence { f n }∞n=1 of simple functions such that
lim f n (x) = f (x) for any real function f defined on a measurable space. If f
n→∞
is a measurable function, then { f n }∞
n=1 can be chosen as a sequence of measurable
functions, and if f ≥ 0, { f n }∞
n=1 can be chosen to increase monotonically.

Proof When f ≥ 0, let Bn,i = x : i−1


2n
≤ f (x) ≤ i
2n
and Fn = {x : f (x) ≥ n},
and then choose
148 2 Fundamentals of Probability


n
n2
i −1
f n (x) = K Bn,i (x) + n K Fn (x). (2.A.28)
i=1
2n

More generally, by letting f + = max( f, 0), f − = − min( f, 0), and f = f + − f − ,


we can prove the theorem easily. ♠
Definition 2.A.11 (Lebesgue integral) In a measurable space (X, M, μ), let s(x) =
n
ci K Bi (x) be a measurable function, where ci > 0 and x ∈ X . In addition, let E ∈
i=1

n
M and I E (s) = ci μ (E ∩ Bi ). Then, for a non-negative and measurable function
i=1
f,

f dμ = sup I E (s) (2.A.29)
E

is called the Lebesgue integral.


In Definition 2.A.11, the upper bound is obtained over all measurable simple
functions s such that 0 ≤ s ≤ f . In the meantime, when the function f is not always
positive, the Lebesgue integral can be defined as
  
+
f dμ = f dμ − f − dμ (2.A.30)
E E E
 
if at least one of E f + dμ and E f − dμ is finite, where f + = max( f, 0) and f − =
− min( f, 0). Note that f = f + − f − and that f+ and f − are measurable functions.
If both E f + dμ and E f − dμ are finite, then E f dμ is finite and the function f
is called Lebesgue integrable on E for μ, which is expressed as f ∈ L(μ) on E.
Based on mensuration by parts, the Riemann integral is the sum of products
of the value of a function in an arbitrarily small interval composing the integral
region and the length of the interval. On the other hand, the Lebesgue integral is the
sum of products of the value of a function and the measure of the interval in the
domain corresponding to an arbitrarily small interval in the range of the function.
The Lebesgue integral exists not only for all Riemann integrable functions but also
for other functions while the Riemann integral exists only when the function is at
least piecewise continuous.
Some of the properties of the Lebesgue integral are as follows:
(1) If a function f is measurable on E and bounded and μ(E) is finite, then f ∈ L(μ)
on E. 
(2) If the measure μ(E) is finite and a ≤ f ≤ b, then aμ(E) ≤ E f dμ ≤bμ(E).
 E f dμ ≤ E gdμ.
(3) If f, g ∈ L(μ) on the set E and f (x) ≤ g(x) for x ∈ E, then
(4) If f ∈ L(μ) on the set E and c is a finite constant, then E c f dμ ≤ c E f dμ
and c f ∈ L(μ).   
(5) If f ∈ L(μ) on the set E, then | f | ∈ L(μ) and  E f dμ ≤ E | f |dμ.
Appendices 149

(6) If a function f is measurable on the set E and μ(E) = 0, then
 E f dμ = 0.
(7) If a function f is Lebesgue integrable on X and φ(A) = A f dμ on A ∈ M,
then φ is additive on M.  
(8) Let A ∈ M, B ⊆ A, and μ(A − B) = 0. Then, A f dμ = B f dμ.
(9) Consider a sequence { f n }∞
n=1 of measurable functions such that lim f n (x) =
n→∞
f (x) for E ∈ M and x ∈ E. If there exists a function g ∈ L(μ) such that
| f n (x)| ≤ g(x), then lim E f n dμ = E f dμ.
n→∞
(10) If a function f is Riemann integrable on [a, b], then f is Lebesgue integrable
and the Lebesgue integral with the Lebesgue measure is the same as the Riemann
integral.

Appendix 2.4 Non-measurable Sets

Assume the open unit interval J = (0, 1) and the set Q of rational numbers in the
real space R. Consider the translation operator Tt : R → R such that Tt (x) = x + t
for x ∈ R. Suppose the countable set Γt = Tt Q, i.e.,

Γt = {t + q : q ∈ Q}. (2.A.31)

For example, we have Γ5445 = {q + 5445 : q ∈ Q} = Q and Γπ = {q + π : q ∈ Q}


when t = 5445 and t = π , respectively.
It is clear that

Γt ∩ J  = ∅ (2.A.32)

because we can always find a rational number q such that 0 < t + q < 1 for
any real number t. We have Γt = {t + q : q ∈ Q} = {s + (t − s) + q : q ∈ Q} =
s + q : q ∈ Q = Γs and Γt ∩ Γs = ∅ when t − s is a rational number and an
irrational number, respectively. Based on this observation, consider the collection

K = {Γt : t ∈ R, distinct Γt only} (2.A.33)

of sets (Rao 2004). Then, we have the following facts:


(1) The collection K is a partition of R.
(2) There exists only one rational number t for Γt ∈ K.
(3) There exist uncountably many sets in K.
(4) For two distinct sets Γt and Γs in K, the number t − s is not a rational number.
150 2 Fundamentals of Probability

Definition 2.A.12 (Vitali set) Based on the axiom of choice17 and (2.A.32), we can
obtain an uncountable set

V0 = {x : x ∈ Γt ∩ J, Γt ∈ K} , (2.A.34)

where x represents a number in the interval (0, 1) and an element of Γt ∈ K. The set
V0 is called the Vitali set.
Note that the points in the Vitali set V0 are all in interval (0, 1) and have a one-to-
one correspondence with the sets in K. Denoting the enumeration of all the rational

numbers in the interval (−1, 1) by {αi }i=1 , we get the following theorem:
Theorem 2.A.8 For the Vitali set V0 ,

(0, 1) ⊆ ∪ Tαi V0 ⊆ (−1, 2) (2.A.35)
i=1

holds true.

Proof First, −1 < αi + x < 2 because −1 < αi < 1 and any point x in V0 satisfies
0 < x < 1. In other words, Tαi x ∈ (−1, 2), and therefore


∪ Tαi V0 ⊆ (−1, 2). (2.A.36)
i=1

Next, for any point x in (0, 1), x ∈ Γt with an appropriately chosen t as we have
observed in (2.A.32). Then, we have Γt = Γx and x ∈ Γt = Γx because x − t is a
rational number. Now, denoting a point in Γx ∩ V0 by y, we have y = x + q because
Γx ∩ V0 = ∅ and therefore y − x ∈ Q. Here, y − x is a rational number in (−1, 1)
because 0 < x, y < 1 and, consequently, we can put y − x = αi : in other words,
y = x + αi = Tαi x ∈ Tαi V0 . Thus, we have


(0, 1) ⊆ ∪ Tαi V0 . (2.A.37)
i=1

Subsequently, we get (2.A.35) from (2.A.36) and (2.A.37). ♠



Theorem 2.A.9 The sets Tαi V0 i=1 are all mutually exclusive: in other words,
   
Tαi V0 ∩ Tα j V0 = ∅ (2.A.38)

for i = j.

17 The axiom of choice can be expressed as “For a non-empty set B ⊆ A, there exists a choice
function f : 2 A → A such that f (B) ∈ B for any set A.” The axiom of choice can be phrased in
various expressions, and that in Definition 2.A.12 is based on “If we assume a partition P S of S
composed only of non-empty sets, then there exists a set B for which the intersection with any set
in P S is a singleton set.”
Appendices 151

Proof We prove the theorem by contradiction.


 When i = j or, equivalently, when
αi = α j , assume that Tαi V0 ∩ Tα j V0 is not a null set. Letting one element of
the intersection be y, we have y = x + αi = x + α j for x, x ∈ V0 . It is clear that
Γx = Γx because x − x = α j − αi is a rational number. Thus, x = x from the
definition
  of  K, and therefore αi = α j : this is contradictory to αi = α j . Consequently,
Tαi V0 ∩ Tα j V0 = ∅. ♠

Theorem 2.A.10 No set in Tαi V0 i=1
is Lebesgue measurable: in other words,
Tαi V0 ∈
/ M(μ) for any i.


Proof We prove the theorem by contradiction. Assume that the sets Tαi V0 i=1 are
measurable. Then, from the translation invariance18 of a measure,
 they have the same
measure. Denoting the Lebesgue measure of Tαi V0 by μ Tαi V0 = β, we have
 

μ((0, 1)) ≤ μ ∪ Tαi V0 ≤ μ((−1, 2)) (2.A.39)
i=1

from
 (2.A.35).
 Here, μ((0, 1)) = 1 and μ((−1, 2)) = 3. In addition, we have
∞ 
∞  
μ ∪ Tαi V0 = μ Tαi V0 , i.e.,
i=1 i=1

  ∞


μ ∪ Tαi V0 = β (2.A.40)
i=1
i=1


because Tαi V0 i=1 is a collection of mutually exclusive sets as we have observed
in (2.A.38). Combining (2.A.39) and (2.A.40) leads us to


1 ≤ β ≤ 3, (2.A.41)
i=1

which can be satisfied neither with β = 0 nor with β = 0. Consequently, no set in



Tαi V0 i=1 , including V0 , is Lebesgue measurable. ♠

Exercises

Exercise 2.1 Obtain the algebra generated from the collection C = {{a}, {b}} of the
set S = {a, b, c, d}.
Exercise 2.2 Obtain the σ -algebra generated from the collection C = {{a}, {b}} of
the set S = {a, b, c, d}.

18 For any real number x, the measure of A = {a} is the same as that of A + x = {a + x}.
152 2 Fundamentals of Probability

Exercise 2.3 Obtain the sample space S in the following random experiments:
(1) An experiment measuring the lifetime of a battery.
(2) An experiment in which an integer n is selected in the interval [0, 2] and then
an integer m is selected in the interval [0, n].
(3) An experiment of checking the color of, and the number written on, a ball selected
randomly from a box containing two red, one green, and two blue balls denoted
by 1, 2, . . . , 5, respectively.

Exercise 2.4 When P(A) = P(B) = P(AB), obtain P ( AB c + B Ac ).

Exercise 2.5 Consider rolling a fair die. For A = {1}, B = {2, 4}, and C =
{1, 3, 5, 6}, obtain P(A ∪ B), P(A ∪ C), and P(A ∪ B ∪ C).

Exercise 2.6 Consider the events A = (−∞, r ] and B = (−∞, s] with r ≤ s in the
sample space of real numbers.
(1) Express C = (r, s] in terms of A and B.
(2) Show that B = A ∪ C and A ∩ C = ∅.

Exercise 2.7 When ten distinct red and ten distinct black balls are randomly arranged
into a single line, find the probability that red and black balls are placed in an
alternating fashion.

Exercise 2.8 Consider two branches between two nodes in a circuit. One of the two
branches is a resistor and the other is a series connection of two resistors. Obtain the
probability that the two nodes are disconnected assuming that the probability for a
resistor to be disconnected is p and disconnection in a resistor is not influenced by
the status of other resistors.

Exercise 2.9 Show that Ac and B are independent of each other and that Ac and B c
are independent of each other when A and B are independent of each other.

Exercise 2.10 Assume the sample space S = {1, 2, 3} and event space F = 2 S .
Show that no two events, except S and ∅, are independent of each other for any
probability measure such that P(1) > 0, P(2) > 0, and P(3) > 0.

Exercise 2.11 For two events A and B, show the followings:


(1) If P(A) = 0, then P(AB) = 0.
(2) If P(A) = 1, then P(AB) = P(B).

Exercise 2.12 Among 100 lottery tickets sold each week, one is a winning ticket.
When a ticket costs 10 euros and we have 500 euros, does buying 50 tickets in one
week bring us a higher probability of getting the winning ticket than buying one
ticket over 50 weeks?

Exercise 2.13 In rolling a fair die twice, find the probability that the sum of the two
outcomes is 7 when we have 3 from the first rolling.
Exercises 153

Exercise 2.14 When a pair of fair dice are rolled once, find P(a − 2b < 0), where
a and b are the face values of the two dice with a ≥ b.

Exercise 2.15 When we choose subsets A, B, and C from D = {1, 2, . . . , k} ran-


domly, find the probability that C ∩ (A − B)c = ∅.

Exercise 2.16 Denote the four vertices of a regular tetrahedron by A, B, C, and D.


In each movement from one vertex to another, the probability of arriving at another
vertex is 13 for each of the three vertices. Find the probabilities pn,A and pn,B that
we arrive at A and B, respectively, after n movements starting from A. Obtain the
values of p10,A and p10,B when n = 10.

Exercise 2.17 A box contains N balls each marked with a number 1, 2, . . ., and N ,
respectively. Each of N students with identification (ID) numbers 1, 2, . . ., and N ,
respectively, chooses a ball randomly from the box. If the number marked on the ball
and the ID number of the student are the same, then it is called a match.
(1) Find the probability of no match.
(2) Using conditional probability, obtain the probability in (1) again.
(3) Find the probability of k matches.

Exercise 2.18 In the interval [0, 1] on a line of real numbers, two points are chosen
randomly. Find the probability that the distance between the two points is shorter
than 21 .

Exercise 2.19 Consider the probability space composed of the sample space S =
{all pairs (k, m) of natural numbers} and probability measure

P((k, m)) = α(1 − p)k+m−2 , (2.E.1)

where α is a constant and 0 < p < 1.


(1) Determine the constant α. Then, obtain the probability P((k, m) : k ≥ m).
(2) Obtain the probability P((k, m) : k + m = r ) as a function of r ∈ {2, 3, . . .}.
Confirm that the result is a probability measure.
(3) Obtain the probability P((k, m) : kis an odd number).

Exercise 2.20 Obtain P ( A ∩ B), P ( A| B), and P ( B| A) when P ( A) = 0.7,


P (B) = 0.5, and P ([A ∪ B]c ) = 0.1.

Exercise 2.21 Three people shoot at a target. Let the event of a hit by the i-th
person be Ai for i = 1, 2, 3 and assume the three events are independent of each
other. When P (A1 ) = 0.7, P ( A2 ) = 0.9, and P (A3 ) = 0.8, find the probability that
only two people will hit the target.

Exercise 2.22 In testing circuit elements, let A = {defective element} and B =


{element identified as defective}, and P ( B| A) = p, P (B c | Ac ) = q, P ( A) = r ,
and P(B) = s. Because the test is not perfect, two types of errors could occur: a
154 2 Fundamentals of Probability

false negative, ‘a defective element is identified to be fine’; or a false positive, ‘a


functional element is identified to be defective’. Assume that the production and
testing of the elements can be adjusted such that the parameters p, q, r , and s are
very close to 0 or 1.
(1) For each of the four parameters, explain whether it is more desirable to make it
closer to 0 or 1.
(2) Describe the meaning of the conditional probabilities P ( B c | A) and P (B |Ac ).
(3) Describe the meaning of the conditional probabilities P ( Ac | B) and P (A |B c ) .
(4) Given the values of the parameters p, q, r , and s, obtain the probabilities in (2)
and (3).
(5) Obtain the sample space of this experiment.

Exercise 2.23 For three events A, B, and C, show the following results without
using Venn diagrams:

(1) P(A ∪ B) = P(A) + P(B) − P(AB).


(2) P(A ∪ B ∪ C) = P(A) + P(B) + P(C) + P(ABC) − P(AB) − P(AC) −
P(BC).
(3) Union upper bound, i.e.,
  
n
n
P ∪ Ai ≤ P ( Ai ) . (2.E.2)
i=1
i=1


Exercise 2.24 For the sample space S and events E, F, and {Bi }i=1 , show that the
conditional probability satisfies the axioms of probability as follows:
(1) 0 ≤ P(E|F) ≤ 1.
(2) P(S|F)
 = 1.
∞  ∞
(3) P ∪ Bi  F = ∞
P ( Bi | F) when the events {Bi }i=1 are mutually exclusive.
i=1 i=1

Exercise 2.25 Assume an event B and a collection {Ai }i=1


n
of events, where {Ai }i=1
n

is a partition of the sample space S.

(1) Explain whether or not Ai and A j for i = j are independent of each other.
(2) Obtain a partition of B using {Ai }i=1
n
.
n
(3) Show the total probability theorem P(B) = P (B |Ai ) P ( Ai ).
i=1
P(B|Ak )P(Ak )
(4) Show the Bayes’ theorem P ( Ak | B) = 
n .
P(B|Ai )P(Ai )
i=1

Exercise 2.26 Box 1 contains two red and three green balls and Box 2 contains one
red and four green balls. Obtain the probability of selecting a red ball when a ball is
selected from a randomly chosen box.
Exercises 155

Exercise 2.27 Box i contains i red and (n − i) green balls for i = 1, 2, . . . , n.


2i
Choosing Box i with probability n(n+1) , obtain the probability of selecting a red
ball when a ball is selected from the box chosen.

Exercise 2.28 A group of people elects one person via rock-paper-scissors. If there
is only one person who wins, then the person is chosen; otherwise, the rock-paper-
scissors is repeated. Assume that the probability of rock, paper, and scissors are 13
for every person and not affected by other people. Obtain the probability pn,k that n
people will elect one person in k trials.

Exercise 2.29 In an election, Candidates A and B will get n and m votes, respec-
tively. When n > m, find the probability that Candidate A will always have more
counts than Candidate B during the ballot-counting.

Exercise 2.30 A type O cell is cultured at time 0. After one hour, the cell will
become

two type O cells, probability = 41 ,


one type O cell, one type M cell, probability = 23 , (2.E.3)
two type M cells, probability = 121
.

A new type O cell behaves like the first type O cell and a type M cell will disappear in
one hour, where a change is not influenced by any other change. Find the probability
β0 that no type M cell will appear until n + 21 hours from the starting time.

Exercise 2.31 Find the probability of the event A that 5 or 6 appears k times when
a fair die is rolled n times.

Exercise 2.32 Consider a communication channel for signals of binary digits (bits)
0 and 1. Due to the influence of noise, two types of errors can occur as shown in
Fig. 2.19: specifically, 0 and 1 can be identified to be 1 and 0, respectively. Let the
transmitted and received bits be X and Y , respectively. Assume a priori probability
of P (X = 1) = p for 1 and P (X = 0) = 1 − p for 0, and the effect of noise on a
bit is not influenced by that on other bits. Denote the probability that the received bit
is i when the transmitted bit is i by P ( Y = i| X = i) = pii for i = 0, 1.

Fig. 2.19 A binary transmitted signal received signal


communication channel 0 p00 = 1 − p01 0

p01

p10
1 1
p11 = 1 − p10
156 2 Fundamentals of Probability

(1) Obtain the probabilities p10 = P(Y = 0|X = 1) and p01 = P(Y = 1|X = 0)
that an error occurs when bits 1 and 0 are transmitted, respectively.
(2) Obtain the probability that an error occurs.
(3) Obtain the probabilities P(Y = 1) and P(Y = 0) that the received bit is identified
to be 1 and 0, respectively.
(4) Obtain all a posteriori probabilities P(X = j|Y = k) for j = 0, 1 and k = 0, 1.
(5) When p = 0.5, obtain P(X = 1|Y = 0), P(X = 1|Y = 1), P(Y = 1), and P(Y =
0) for a symmetric channel with p00 = p11 .

Exercise 2.33 Assume a pile of n integrated circuits (ICs), among which m are
defective ones. When an IC is chosen randomly from the pile, the probability that
the IC is defective is α1 = mn as shown in Example 2.3.8.
(1) Assume we pick one IC and then one more IC without replacing the first one
back to the pile. Obtain the probabilities α1,1 , α0,1 , α1,0 , and α0,0 that both are
defective, the first one is not defective and the second one is defective, the first
one is defective and the second one is not defective, and neither the first nor the
second one is defective, respectively.
(2) Now assume we pick one IC and then one more IC after replacing the first one
back to the pile. Obtain the probabilities α1,1 , α0,1 , α1,0 , and α0,0 again.
(3) Assume we pick two ICs randomly from the pile. Obtain the probabilities β0 , β1 ,
and β2 that neither is defective, one is defective and the other is not defective,
and both are defective, respectively.

Exercise 2.34 Box 1 contains two old and three new erasers and Box 2 contains one
old and six new erasers. We perform the experiment “choose one box randomly and
pick an eraser at random” twice, during which we discard the first eraser picked.
(1) Obtain the probabilities P2 , P1 , and P0 that both erasers are old, one is old and
the other is new, and both erasers are new, respectively.
(2) When both erasers are old, obtain the probability P3 that one is from Box 1 and
the other is from Box 2.

Exercise 2.35 The probability for a couple to have k children is αp k with 0 < p < 1.
(1) The color of the eyes being brown for a child is of probability b and is independent
of that of other children. Obtain the probability that the couple has r children
with brown eyes.
(2) Assuming that a child being a girl or a boy is of probability 21 , obtain the proba-
bility that the couple has r boys.
(3) Assuming that a child being a girl or a boy is of probability 21 , obtain the prob-
ability that the couple has at least two boys when the couple has at least one
boy.

Exercise 2.36 For the pmf p(x) = r +x−1 Cr −1 α


r
(1 − α)x , x ∈ J0 introduced in
(2.5.14), show
Exercises 157

λx −λ
lim p(x) = e , (2.E.4)
r →∞ x!
 r 
which implies lim NB r, r +λ = P(λ).
r →∞
Exercise 2.37 A person plans to buy a car of price N units. The person has k units
and wishes to earn the remaining from a game. In the game, the person wins and
loses 1 unit when the outcome is a head and a tail, respectively, from a toss of a coin
with probability p for a head and q = 1 − p for a tail. Assuming 0 < k < N and
the person continues the game until the person earns enough for the car or loses all
the money, find the probability that the person loses all the money. This problem is
called the gambler’s ruin problem.
Exercise 2.38 A large number of bundles, each with 25 tulip bulbs, are contained in
a large box. The bundles are of type R5 and R15 with portions 43 and 41 , respectively.
A type R5 bundle contains five red and twenty white bulbs and a type R15 bundle
contains fifteen red and ten white bulbs. A bulb, chosen randomly from a bundle
selected at random from the box, is planted.
(1) Obtain the probability p1 that a red tulip blossoms.
(2) Obtain the probability p2 that a white tulip blossoms.
(3) When a red tulip blossoms, obtain the conditional probability that the bulb is
from a type R15 bundle.
Exercise 2.39 For a probability space with the sample space Ω = J0 = {0, 1, . . .}
and pmf

5c2 + c, x = 0; 3 − 13c, x = 1;
p(x) = (2.E.5)
c, x = 2; 0, otherwise;

determine the constant c.


Exercise 2.40 Show that
 2
x 1 x
φ(x) < Q(x) < exp − (2.E.6)
1 + x2 2 2

for x > 0, where φ(x) denotes the standard normal pdf, i.e., (2.5.27) with m = 0
and σ 2 = 1, and
 ∞  2
1 t
Q(x) = √ exp − dt. (2.E.7)
2π x 2

Exercise 2.41 Balls with colors C1, C2, . . ., Cn are contained in k boxes. Let the
probability of choosing Box B j be P B j = b j and that of choosing a ball with color
n k
  k
Ci from Box B j be P (Ci | B j = ci j , where ci j = 1 and b j = 1. A box
i=1 j=1 j=1
is chosen first and then a ball is chosen from the box.
158 2 Fundamentals of Probability

(1) Show that, if {ci1 = ci2 = · · · = cik }i=1


n
, the color of the ball chosen is indepen-
    n k
dent of the choice of a box, i.e., P Ci B j = P (Ci ) P B j i=1 j=1 , for any
k
values of b j j=1 .
(2) When n = 2, k = 3, b1 = b3 = 41 , and b2 = 21 , express the condition for
P (C1 B1 ) = P (C1 ) P (B1 ) to hold true in terms of {c11 , c12 , c13 }.

Exercise 2.42 Boxes 1, 2, and 3 contain four red and five green balls, one red
and one green balls, and one red and two green balls, respectively. Assume that
the probabilities of the event Bi of choosing Box i are P (B1 ) = P (B3 ) = 41 and
P (B2 ) = 21 . After a box is selected, a ball is chosen randomly from the box. Denote
the events that the ball is red and green by R and G, respectively.
(1) Are the events B1 and R independent of each other? Are the events B1 and G
independent of each other?
(2) Are the events B2 and R independent of each other? Are the events B3 and G
independent of each other?
4
Exercise 2.43 For the sample space Ω = {1, 2, 3, 4} with P(i) = 41 i=1 , consider
A1 = {1, 3, 4}, A2 = {2, 3, 4}, and A3 = {3}. Are the three events A1 , A2 , and A3
independent of each other?

Exercise 2.44 Consider two consecutive experiments with possible outcomes A and
B for the first experiment and C and D for the second experiment. When P ( AC) = 13 ,
P (AD) = 16 , P (BC) = 16 , and P (B D) = 13 , are A and C independent of each other?

Exercise 2.45 Two people make an appointment to meet between 10 and 11 o’clock.
Find the probability that they can meet assuming that each person arrives at the
meeting place between 10 and 11 o’clock independently and waits only up to 10
minutes.

Exercise 2.46 Consider two children. Assume any child can be a girl or a boy
equally likely. Find the probability p1 that both are boys when the elder is a boy and
the probability p2 that both are boys when at least one is a boy.

Exercise 2.47 There are three red and two green balls in Box 1, and four red and
three green balls in Box 2. A ball is randomly chosen from Box 1 and put into Box
2. Then, a ball is picked from Box 2. Find the probability that the ball picked from
Box 2 is red.

Exercise 2.48 Three people A, B, and C toss a coin each. The person whose outcome
is different from those of the other two wins. If the three outcomes are the same, then
the toss is repeated.
(1) Show that the game is fair, i.e., the probability of winning is the same for each
of the three people.
Exercises 159

Table 2.2 Some probabilities in the game mighty


(A) Player G 1 murmurs (B) Player G 1 murmurs
“Oh! I do not have “Oh! I have neither the
the joker.” mighty nor the joker.”
(1) Probability 0, Player G 1 , 0, Player G 1 ,
43 = 1118 , Players {G i }i=2 , = Players {G i }i=2
301 , ,
10 260 5 10 70 5
of having 43
43 = 1118 , on the table (G 6 ) . = on the table (G 6 ) .
301 ,
3 78 3 21
the joker 43
26 = 1118 , Player G 1 , Player G 1 ,
5 215
(2) Probability 0,
559 = 1118 , Players {G i }i=2 , = 301 Players {G i }i=2 ,
,
105 210 5 10 70 5
of having 43
1118 , on the table (G 6 ) . 43 = 301 , on the table (G 6 ) .
63 3 21
the mighty
(3) Probability
26 = 1118 , Player G 1 , Player G 1 ,
5 215
0,
of having either
559 = 1118 ,
190 380
Players {G i }i=2
5
, 301 ,
110
Players {G i }i=2
5
,
the mighty
1118 , on the table (G 6 ) . 301 , on the table (G 6 ) .
135 40
or joker
(4) Probability
26 = 1118 , Player G 1 , Player G 1 ,
5 215
0,
of having at least
1118 ,
425
Players {G i }i=2
5
, 301 ,
125
Players {G i }i=2
5
,
one of the mighty
559 = 1118 , on the table (G 6 ) . 301 , on the table (G 6 ) .
69 138 41
and joker
(5) Probability
0, Player G 1 , 0, Player G 1 ,
of having both
1118 ,
45
Players {G i }i=2
5
, 301 ,
15
Players {G i }i=2
5
,
the mighty
1118 , on the table (G 6 ) . 301 , on the table (G 6 ) .
3 1
and joker

(2) Find the probabilities that B wins exactly eight times and at least eight times
when the coins are tossed ten times, not counting the number of no winner.

Exercise 2.49 A game called mighty can be played by three, four, or five players.
When it is played with five players, 53 cards are used by adding one joker to a deck
of 52 cards. Among the 53 cards, the ace of spades is called the mighty, except when
the suit of spades19 is declared the royal suit. In the play, ten cards are distributed to
each of the five players {G i }i=1
5
and the remaining three cards are left on the table,
face side down. Assume that what Player G 1 murmurs is always true and consider
the two cases (A) Player G 1 murmurs “Oh! I do not have the joker.” and (B) Player
G 1 murmurs “Oh! I have neither the mighty nor the joker.” For convenience, let the
three cards on the table be Player G 6 . Obtain the following probabilities and thereby
confirm Table 2.2:
(1) Player G i has the joker.
(2) Player G i has the mighty.
(3) Player G i has either the mighty or the joker.

19When the suit of spades is declared the royal suit, the ace of diamonds, not the ace of spades,
becomes the mighty.
160 2 Fundamentals of Probability

(4) Player G i has at least one of the mighty and the joker.
(5) Player G i has both the mighty and the joker.

Exercise 2.50 In a group of 30 men and 20 women, 40% of men and 60% of women
play piano. When a person in the group plays piano, find the probability that the
person is a man.

Exercise 2.51 The probability that a car, a truck, and a bus passes through a toll gate
is 0.5, 0.3, and 0.2, respectively. Find the probability that 30 cars, 15 trucks, and 5
buses has passed when 50 automobiles have passed the toll gate.

References

N. Balakrishnan, Handbook of the Logistic Distribution (Marcel Dekker, New York, 1992)
P.J. Bickel, K.A. Doksum, Mathematical Statistics (Holden-Day, San Francisco, 1977)
H.A. David, H.N. Nagaraja, Order Statistics, 3rd edn. (Wiley, New York, 2003)
R.M. Gray, L.D. Davisson, An Introduction to Statistical Signal Processing (Cambridge University
Press, Cambridge, 2010)
A. Gut, An Intermediate Course in Probability (Springer, New York, 1995)
C.W. Helstrom, Probability and Stochastic Processes for Engineers, 2nd edn. (Prentice-Hall, Engle-
wood Cliffs, 1991)
S. Kim, Mathematical Statistics (in Korean) (Freedom Academy, Paju, 2010)
A. Leon-Garcia, Probability, Statistics, and Random Processes for Electrical Engineering, 3rd edn.
(Prentice Hall, New York, 2008)
M. Loeve, Probability Theory, 4th edn. (Springer, New York, 1977)
E. Lukacs, Characteristic Functions, 2nd edn. (Griffin, London, 1970)
T.M. Mills, Problems in Probability (World Scientific, Singapore, 2001)
M.M. Rao, Measure Theory and Integration, 2nd edn. (Marcel Dekker, New York, 2004)
V.K. Rohatgi, A.KMd.E. Saleh, An Introduction to Probability and Statistics, 2nd edn. (Wiley, New
York, 2001)
J.P. Romano, A.F. Siegel, Counterexamples in Probability and Statistics (Chapman and Hall, New
York, 1986)
S.M. Ross, A First Course in Probability (Macmillan, New York, 1976)
S.M. Ross, Stochastic Processes, 2nd edn. (Wiley, New York, 1996)
A.N. Shiryaev, Probability, 2nd edn. (Springer, New York, 1996)
A.A. Sveshnikov (ed.), Problems in Probability Theory (Mathematical Statistics and Theory of
Random Functions, Dover, New York, 1968)
J.B. Thomas, Introduction to Probability (Springer, New York, 1986)
P. Weirich, Conditional probabilities and probabilities given knowledge of a condition. Philos. Sci.
50(1), 82–95 (1983)
C.K. Wong, A note on mutually independent events. Am. Stat. 26(2), 27–28 (1972)
Chapter 3
Random Variables

Based on the description of probability in Chap. 2, let us now introduce and discuss
several topics on random variables: namely, the notions of the cumulative distribution
function, expected values, and moments. We will then discuss conditional distribution
and describe some of the widely-used distributions.

3.1 Distributions

Let us start by introducing the notion of the random variable and its distribution
(Gardner 1990; Leon-Garcia 2008; Papoulis and Pillai 2002). In describing the dis-
tributions of random variables, we adopt the notion of the cumulative distribution
function, which is a useful tool in characterizing the probabilistic properties of ran-
dom variables.

3.1.1 Random Variables

Generally, a random variable is a real function of which the domain is a sample space.
The range of a random variable X : Ω → R on a sample space Ω is S X = {x : x =
X (s), s ∈ Ω} ⊆ R. In fact, a random variable is not a variable but a function: yet, it
is customary to call it a variable. In many cases, a random variable is denoted by an
upper case alphabet such as X , Y , . . ..

Definition 3.1.1 (random variable) For a sample space Ω of the outcomes from a
random experiment, a function X that assigns a real number x = X (ω) to ω ∈ Ω is
called a random variable.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 161
I. Song et al., Probability and Random Variables: Theory and Applications,
https://doi.org/10.1007/978-3-030-97679-8_3
162 3 Random Variables

For a more precise definition of a random variable, we need the concept of a


measurable function.

Definition 3.1.2 (measurable function) Given a probability space (Ω, F, P), a real-
valued function g that maps the sample space Ω onto the real number R is called a
measurable function when the condition

if B ∈ B(R), then g −1 (B) ∈ F (3.1.1)

is satisfied.

Example 3.1.1 A real-valued function g for which g −1 (D) is a Borel set for every
open set D is called a Borel function, and is a measurable function. ♦

The random variable can be redefined as follows:

Definition 3.1.3 (random variable) A random variable is a measurable function


defined on a probability space.

In general, to show whether or not a function is a random variable is rather


complicated. However,
(A) every function g : Ω → R on a probability space (Ω, F, P) with the event
space F being a power set is a random variable, and
(B) almost all functions such as continuous functions, polynomials, unit step
function, trigonometric functions, limits of measurable functions, min and max of
measurable functions, etc. that we will deal with are random variables.

Example 3.1.2 (Romano and Siegel 1986) For the sample space Ω = {1, 2, 3}
and event space F = {Ω, ∅, {3}, {1, 2}}, assume the function g such that g(1) = 1,
g(2) = 2, and g(3) = 3. Then, g is not a random variable because g −1 ({1}) = {1} ∈
/
F although {1} ∈ B(R). ♦

Random variables can be classified into the following three classes:

Definition 3.1.4 (discrete random variable; continuous random variable; hybrid


random variable) A random variable is called a discrete random variable, continuous
random variable, or hybrid random variable when the range is a countable set, an
uncountable set, or the union of an uncountable set and a countable set, respectively.

A hybrid random variable is also called a mixed-type random variable. A discrete


random variable with finite range is sometimes called a finite random variable. The
probabilistic characteristics of a continuous random variable and a discrete random
variable can be described by the pdf and pmf, respectively. In the meantime, based on
Definitions 1.1.22 and 3.1.4, the range of a discrete random variable can be assumed
as subsets of {0, 1, . . .} or {1, 2, . . .}.
3.1 Distributions 163

Example 3.1.3 When the outcome from a rolling of a fair die is n, let X 1 (n) = n
and

0, n is an odd number,
X 2 (n) = (3.1.2)
1, n is an even number.

Then, X 1 and X 2 are both discrete random variables. ♦

Example 3.1.4 The random variables L, Θ, and D defined below are all continuous
random variables.
(1) When (x, y) denotes the coordinate of a randomly selected
 point Q inside the
unit circle centered atthe
 origin O, the length L(Q) = x 2 + y 2 of O Q. The
angle Θ(Q) = tan−1 xy formed by O Q and the positive x-axis.
(2) The difference D(r ) = |r − r̃ | between a randomly chosen real number r and
its rounded integer r̃ . ♦

Example 3.1.5 Assume the response g ∈ {responding, not responding, busy} in a


phone call. Then, the length of a phone call is a random variable and can be expressed
as

X (g) = t (3.1.3)

for t ≥ 0. Here, because P(X = 0) > 0 and X is continuous for (0, ∞), X is a hybrid
random variable. ♦

3.1.2 Cumulative Distribution Function

Let X be a random variable defined on the probability space (Ω, F, P). Denote the
range of X by A and denote the inverse image of B by X −1 (B) for B ⊆ A. Then,
we have
 
P X (B) = P X −1 (B) , (3.1.4)

which implies that the probability of an event is equal to the probability of the
inverse image of the event. Based on (3.1.4) and the probability measure P of the
original probability space (Ω, F, P), we can obtain the probability measure P X of
the probability space induced by the random variable X .

Example 3.1.6 Consider a rolling of a fair die and assume P(ω) = 16 for ω ∈ Ω =
{1, 2, . . . , 6}. Define a random variable X by X (ω) = −1 for ω = 1, X (ω) = −2
for ω = 2, 3, 4, and X (ω) = −3 for ω = 5, 6. Then, we have A = {−3, −2, −1}.
Logically, X −1 ({−3}) = {5, 6}, X −1 ({−2}) = {2, 3, 4}, and X −1 ({−1}) = {1}. Now,
the probability measure of random variable X can be obtained as P X ({−3}) =
164 3 Random Variables
 
P X −1 ({−3}) = P({5, 6}) = 13 , P X ({−2}) = P({2, 3, 4}) = 21 , and P X ({−1}) =
P({1}) = 16 . ♦

We now describe in detail the distribution of a random variable based on (3.1.4),


and then define a function with which the distribution can be managed more conve-
niently. Consider a random variable X defined on the probability space (Ω, F, P)
and the range A ⊆ R of X . When B ∈ B(A), the set

X −1 (B) = {ω : X (ω) ∈ B} (3.1.5)

is a subset of Ω and, at the same time, an element of the event space F due to
the definition of a random variable. Based on the set X −1 (B) shown in (3.1.5), the
distribution of the random variable X can be defined as follows:

Definition 3.1.5 (distribution) The set function


 
P X (B) = P X −1 (B)
= P ({ω : X (ω) ∈ B}) (3.1.6)

for B ∈ B(A) represents the probability measure of X and is called the distribution
of the random variable X , where A is the range of X and B(A) is the Borel field of
A.

In essence, the distribution of X is a function representing the probabilistic char-


acteristics of the random variable X . The probability measure P X in (3.1.6) induces a
new probability space ( A, B(A), P X ): a consequence is that we can now deal not with
the original probability space (Ω, F, P) but with the equivalent probability space
(A, B(A), P X ), where the sample points are all real numbers. Figure 3.1 shows the
relationship (3.1.6).
The distribution of a random variable can be described by the pmf or pdf as we
have observed in Chap. 2. First, for a discrete random variable X with range A, the
pmf p X of X can be obtained as

p X (x) = P X ({x}), x ∈ A (3.1.7)

Fig. 3.1 The distribution of


random variable X
X −1 (B) X

Ω B R

PX (B) = P ({ω : X(ω) ∈ B}) = P X −1 (B)


3.1 Distributions 165

from the distribution P X , which in turn can be expressed as



P X (B) = p X (x), B ∈ B(A) (3.1.8)
x∈B

in terms of the pmf p X . For a continuous random variable X with range A and pdf
f X , we have

P X (B) = f X (x)d x, B ∈ B(A), (3.1.9)
B

which is the counterpart of (3.1.8): note that the counterpart of (3.1.7) does not exist
for a continuous random variable.

Definition 3.1.6 (cumulative distribution function) Assume a random variable X on


a sample space Ω, and let A x = {s : s ∈ Ω, X (s) ≤ x} for a real number x. Then,
we have P ( A x ) = P(X ≤ x) = P X ((−∞, x]) and the function

FX (x) = P X ((−∞, x]) (3.1.10)

is called the distribution function or cumulative distribution function (cdf) of the


random variable X .

The cdf FX (x) denotes the probability that X is located in the half-open interval
(−∞, x]. For example, FX (2) is the probability that X is in the half-open interval
(−∞, 2], i.e., the probability of the event {−∞ < X ≤ 2}.
The pmf and cdf for a discrete random variable and the pdf and cdf for a continuous
random variable can be expressed in terms of each other, as we shall see in (3.1.24),
(3.1.32), and (3.1.33) later. The probabilistic characteristics of a random variable can
be described by the cdf, pdf, or pmf: these three functions are all frequently indicated
as the distribution function, probability distribution function, or probability function.
In some cases, only the cdf is called the distribution function, and probability function
in the strict sense only indicates the probability measure P as mentioned in Sect. 2.2.3.
In some fields such as statistics, the name distribution function is frequently used
while the name cdf is widespread in other fields including engineering.

Example 3.1.7 Let the outcome from a rolling of a fair die be X . Then, we can
obtain the cdf FX (x) = P(X ≤ x) of X as

FX (x) = P(X ≤ x)

⎨ 1, x ≥ 6,
= 6i , i ≤ x < i + 1, i = 1, 2, 3, 4, 5, (3.1.11)

0, x < 1,

which is shown in Fig. 3.2. ♦


166 3 Random Variables

Fig. 3.2 The cdf FX (x) of FX (x)


the number X resulting from 1
a rolling of a fair die
2
3

1
3

0 1 2 3 4 5 6 x

Fig. 3.3 The cdf FY (x) of FY (x)


the coordinate Y chosen
randomly in the interval 1
[0, 1]

0 1 x

Example 3.1.8 Let the coordinate Y be a number chosen randomly in the interval
[0, 1]. Then, P(Y ≤ x) = 1, x, and 0 when x ≥ 1, 0 ≤ x < 1, and x < 0, respec-
tively. Therefore, the cdf of Y is

⎨ 1, x ≥ 1,
FY (x) = x, 0 ≤ x < 1, (3.1.12)

0, x < 0.

Figure 3.3 shows the cdf FY (x). ♦

Theorem 3.1.1 The cdf is a non-decreasing function: that is, F (x1 ) ≤ F (x2 ) when
x1 < x2 for a cdf F. In addition, we have F(∞) = 1 and F(−∞) = 0.

From the definition of the cdf and probability measure, it is clear that

P(X > x) = 1 − FX (x) (3.1.13)

because P ( Ac ) = 1 − P(A) and that

P(a < X ≤ b) = FX (b) − FX (a) (3.1.14)

for a ≤b. In addition, at a discontinuity point x D of a cdf FX (x), we have FX (x D ) =


FX x D+ and
 
FX (x D ) − FX x D− = P (X = x D ) (3.1.15)
3.1 Distributions 167

Fig. 3.4 An example of the FX (x)


cdf of a hybrid random
1
variable
FX x+
D = FX (xD )
P (X = xD )
FX x−
D

0 xD x

for a discrete or a hybrid random variable as shown in Fig. 3.4. On the other hand,
the probability of one point is 0 for a continuous random variable: in other words,
we have

P(X = x) = 0 (3.1.16)

and
 
FX (x) − FX x − = 0 (3.1.17)

for a continuous random variable X .

Theorem 3.1.2 The cdf is continuous from the right. That is,

FX (x) = lim FX (x + ),  > 0 (3.1.18)


→0

for a cdf FX .


Proof Consider a sequence {αi }i=1 such that αi+1 ≤ αi and lim αi = 0. Then,
i→∞

lim FX (x + ε) − FX (x) = lim {FX (x + αi ) − FX (x)}


ε→0 i→∞
= lim P (X ∈ (x, x + αi ]) . (3.1.19)
i→∞


Now, we have lim P (X ∈ (x, x + αi ]) = lim P X ((x, x + αi ]) = P X lim x,

i→∞ i→∞ i→∞

x + αi from (2.A.1) because {(x, x + αi ]}i=1 is a monotonic sequence. Subse-
   

quently, we have P X lim {(x, x + αi ]} = P X ∩ (x, x + αi ] = P X (∅) from
i→∞ i=1

(1.5.9) and ∩ (x, x + αi ] = ∅ as shown, for instance, in Example 1.5.9. In other
i=1
words,
168 3 Random Variables

lim P (X ∈ (x, x + αi ]) = 0, (3.1.20)


i→∞

completing the proof. ♠

Example 3.1.9 (Loeve 1977) Let the probability measure and corresponding cdf be
P and F, respectively. When g is an integrable function,
 
g dP or g dF (3.1.21)

is called the Lebesgue-Stieltjes integral and is often written as, for instance,
  b
g dP = g d F. (3.1.22)
[a,b) a

When F(x) = x for x ∈ [0, 1], the measure P is called the Lebesgue measure as
mentioned in Definition 2.A.7, and
  b
g dx = g dx (3.1.23)
[a,b) a

is the Lebesgue integral. If g is continuous on [a, b], then the Lebesgue-Stieltjes


b b
integral a g d F is the Riemann-Stieltjes integral, and the Lebesgue integral a g d x
is the Riemann integral. ♦

As we have already seen in Examples 3.1.7 and 3.1.8, subscripts are used to
distinguish the cdf’s of several random variables as in FX and FY . In addition, when
the cdf FX and pdf f X is for the random variable X with the distribution P X , it is
denoted by X ∼ P X , X ∼ FX , or X ∼ f X . For example, X ∼ P(λ) means that the
random variable X follows the Poisson distribution with parameter λ, X ∼ U [a, b)
means that the distribution of the random variable X is the uniform distribution over
[a, b), and Y ∼ f Y (t) = e−t u(t) means that the pdf of the random variable Y is
f Y (t) = e−t u(t).

Theorem 3.1.3 A cdf may have at most countably many jump discontinuities.

Proof Assume the cdf F(x) is discontinuous at x0 . Denote


 1 1 by Dn the set of dis-
continuities with the jump in the half-open interval n+1 , n , where n is a natu-
ral number. Then, the number of elements in Dn is at most n, because otherwise
F(∞) − F(−∞) > 1. In other words, there exists at most one discontinuity with
jump between 21 and 1, at most two discontinuities with jump between 13 and 21 , . . ., at
most n − 1 discontinuities with jump between n1 and n−11
, . . .. Therefore the number
of discontinuities is at most countable. ♠
3.1 Distributions 169

Theorem 3.1.3 is a special case of the more general result that a function which is
continuous from the right-hand side or left-hand side at all points and a monotonic
real function may have, at most, countably many jump discontinuities.
Based on the properties of the cdf, we can now redefine the continuous, discrete,
and hybrid random variables as follows:

Definition 3.1.7 (discrete random variable; continuous random variable; hybrid


random variable) A continuous, discrete, or hybrid random variable is a random
variable whose cdf is a continuous, a step-like function, or a discontinuous but not a
step-like function, respectively.

Here, when a function is increasing only at some points and is constant in a closed
interval not containing the points, the function is called a step-like function. The cdf
shown in Fig. 3.4 is an example of a hybrid random variable which is not continuous
at a point x D .

3.1.3 Probability Density Function and Probability Mass


Function

In characterizing the probabilistic properties of continuous and discrete random vari-


ables, we can use a pdf and a pmf, respectively. In addition, the cdf can also be
employed for the three classes of random variables: the continuous, discrete, and
hybrid random variables.
Let us denote the cdf of a random variable X by FX , the pdf by f X when X
is a continuous random variable, and the pmf by p X when X is a discrete random
variable. Then, the cdf FX (x) = P X ((−∞, x]) can be expressed as
⎧ x

⎪ f X (y)dy, if X is a continuous random variable,
⎨ −∞
FX (x) = 
x (3.1.24)

⎪ p X (y),
⎩ if X is a discrete random variable.
y=−∞

When X is a hybrid random variable, we have for 0 < α < 1


x  x
FX (x) = α p X (k) + (1 − α) f X (y)dy, (3.1.25)
k=−∞ −∞

which is sufficiently general for us to deal with in this book. Note that, as described
in Appendix 3.1, the most general cdf is a weighted sum of an absolutely continuous
function, a discrete function, and
 a singular function.
The probability P X (B) = B d FX (x) of an event B can be obtained as
170 3 Random Variables
⎧
⎨ B f X (x)d x, for a continuous random variable,

P X (B) =  (3.1.26)

⎩ p X (x), for a discrete random variable.
x∈B

Example 3.1.10 Consider a Rayleigh random variable R. Then,from the pdf f R (x)
x2
x t2
= αx2 exp − 2α 2 u(x), the cdf FR (x) = −∞ αt2 exp − 2α 2 u(t)dt is easily
obtained as
  
x2
FR (x) = 1 − exp − 2 u(x). (3.1.27)

2
When α = 1, the probability of the event {1 < R < 2} is f R (t)dt = FR (2) −
√ 1
FR (1) = e−1 − e−2 ≈ 0.4712. ♦

Theorem 3.1.4 The cdf FX satisfies

1 − FX (x) = FX (−x) (3.1.28)

when the pdf f X is an even function.

∞  −∞  −x
Proof First, P(X > x) = x f X (y)dy = −x f X (−t)(−dt) = −∞ f X (t)dt =
FX (−x) because f X (x) = f X (−x). Recollecting (3.1.13), we get (3.1.28). ♠
−kx
Example 3.1.11 Consider the pdf’s f L (x) = (1+e ke
−kx )2 for k > 0 of the logistic distri-
λ −λ|x|
bution (Balakrishnan 1992) and f D (x) = 2 e for λ > 0 of the double exponential
distribution. The cdf’s of these distributions are
1
FL (x) = (3.1.29)
1 + e−kx

and
1
eλx , x ≤ 0,
FD (x) = 2 (3.1.30)
1 − 21 e−λx , x ≥ 0,

respectively, for which Theorem 3.1.4 is easily confirmed. ♦

Example 3.1.12 We have the cdf


 
1 1 x −β
FC (x) = + tan−1 (3.1.31)
2 π α
3.1 Distributions 171

α
−1
for the Cauchy distribution with pdf f C (r ) = π
(r − β)2 + α2 shown
in (2.5.28). ♦
From (3.1.24), we can easily see that the pdf and pmf can be obtained as

d
f X (x) = FX (x)
dx
1
= lim P(x < X ≤ x + ε) (3.1.32)
ε→0 ε

and

p X (xi ) = FX (xi ) − FX (xi−1 ) (3.1.33)

from the cdf when X is a continuous random variable and a discrete random variable,
respectively.
For a discrete random variable, a pmf is used normally. Yet, we can also define
the pdf of a discrete random variable using the impulse function as we have observed
in (2.5.37). Specifically, let the cdf and pmf of a
discrete random
 variable X be FX
and p X , respectively. Then, based on FX (x) = p X (xi ) = p X (xi ) u (x − xi ),
xi ≤x i
we can regard

d 
f X (x) = p X (xi ) u (x − xi )
dx i

= p X (xi ) δ (x − xi ) (3.1.34)
i

as the pdf of X .
Example 3.1.13 For the pdf f (x) = 2x for x ∈ [0, 1] and 0 otherwise, sketch the
cdf.
x
Solution Obtaining the cdf F(x) = −∞ f (t)dt, we get

0, x < 0; x 2 , 0 ≤ x < 1;
F(x) = (3.1.35)
1, x ≥ 1,

which is shown in Fig. 3.5 together with the pdf. ♦

Example 3.1.14 For the pdf

1 1 1
f (x) = {u(x) − u(x − 1)} + δ(x − 1) + δ(x − 2), (3.1.36)
2 3 6

obtain and sketch the cdf.


172 3 Random Variables

f (x) F (x)
2 1

x2

1 x 1 x

Fig. 3.5 The cdf F(x) for the pdf f (x) = 2xu(x)u(1 − x)

f (x) F (x)
1
2 1
5
6
1
3
δ(x − 1) 1
1 2
6
δ(x − 2)

x x
2 1 2 1

Fig. 3.6 The pdf f (x) = 21 {u(x) − u(x − 1)} + 13 δ(x − 1) + 16 δ(x − 2) and cdf F(x)

x
Solution First, we get the cdf F(x) = −∞ f (t)dt as

0, x < 0; x
, 0 ≤ x < 1;
F(x) = 2 (3.1.37)
5
6
, 1 ≤ x < 2; 1, 2 ≤ x,

which is shown in Fig. 3.6 together with the pdf (3.1.36). ♦

Example 3.1.15 Let X be the face of a die from a rolling. Then, the cdf of X is

6
FX (x) = 16 u(x − i), from which we get the pdf
i=1

1
6
f X (x) = δ(x − i) (3.1.38)
6 i=1

of X by differentiation. In addition,
1
, i = 1, 2, . . . , 6,
p X (i) = 6 (3.1.39)
0, otherwise

is the pmf of X . ♦
3.1 Distributions 173

Example 3.1.16 The function (2.5.37) addressed in Example 2.5.23 is the pdf of a
hybrid random variable. ♦
Example 3.1.17 A box contains G green and B blue balls. Assume we take one
ball from the box n times without1 replacement. Obtain the pmf of the number X of
green balls among the n balls taken from the box.
Solution We easily get the probability of X = k as

B Cn−k G Ck
P(X = k) = (3.1.40)
G+B Cn

for {0 ≤ k ≤ G, 0 ≤ n − k ≤ B} or, equivalently, for max(0, n − B) ≤ k ≤ min(n, G).


Thus, the pmf of X is

⎨ B Cn−k G Ck
, k = ǩ, ǩ + 1, . . . , min(n, G),
p X (k) = G+B Cn (3.1.41)
⎩ 0, otherwise,

where2 ǩ = max(0, n − B). In addition, (3.1.41) will become p X (k) = 1 for k = 0


and 0 for k = 0 when G = 0, and p X (k) = 1 for k = n and 0 for k = n when B = 0.
The distribution with replacement of the balls will be addressed in Exercise 3.5. ♦
For a random variable X with pdf f X and cdf FX , noting that the cdf is continuous
from the right-hand side, the probability of the event {x1 < X ≤ x2 } shown in (3.1.14)
can be obtained as

P (x1 < X ≤ x2 ) = FX (x2 ) − FX (x1 )


 x2+
= f X (x)d x. (3.1.42)
x1+

2

Example 3.1.18 For the random variable Z with pdf f Z (z) = √12π exp − z2 ,
1
we have P(|Z | ≤ 1) = −1 f Z (z)dz ≈ 0.6826, P(|Z | ≤ 2) ≈ 0.9544, and P(|Z | ≤
3) ≈ 0.9974. ♦
Using (3.1.42),
∞ the value F(∞) = 1 mentioned in Theorem 3.1.1 can be con-
firmed as −∞ f (x)d x = P(−∞ < X ≤ ∞) = F(∞) = 1. Let us mention that
 x−  x+
although P (x1 ≤ X < x2 ) = x −2 f X (x)d x, P (x1 ≤ X ≤ x2 ) = x −2 f X (x)d x, and
1 1

1 The distribution of X is a hypergeometric distribution.


2 Here, ‘max(0, n − B) ≤ k ≤ min(n, G)’ can be replaced with ‘all integers k’ by noting that
p Cq = 0 for q < 0 or q > p when p is a non-negative integer and q is an integer from
Table 1.4.
174 3 Random Variables

 x−
P (x1 < X < x2 ) = x +2 f X (x)d x are slightly different from (3.1.42), these four
1
probabilities are all equal to each other unless the pdf f X contains impulse func-
tions at x1 or x2 .
As it is observed, for instance, in Example 3.1.15, considering a continuous ran-
dom variable with the pdf is very similar to considering a discrete random variable
with the pmf. Therefore, we will henceforth focus on discussing a continuous random
variable with the pdf. One final point is that

lim f (x) = 0 (3.1.43)


x→±∞

and

lim f  (x) = 0 (3.1.44)


x→±∞

hold true for all the pdf’s f we will discuss in this book.

3.2 Functions of Random Variables and Their Distributions

In this section, when the cdf FX , pdf f X , or pmf p X of a random variable X is known,
we obtain the probability functions of a new random variable Y = g(X ), where g is
a measurable function (Middleton 1960).

3.2.1 Cumulative Distribution Function

First, the cdf FY (v) = P(Y ≤ v) = P(g(X ) ≤ v) of Y = g(X ) can be obtained as

FY (v) = P(x : g(x) ≤ v, x ∈ A), (3.2.1)

where A is the sample space of the random variable X . Using (3.2.1), the pdf or pmf
of Y can be obtained subsequently: specifically, we can obtain the pdf of Y as

d
f Y (v) = FY (v) (3.2.2)
dv

when Y is a continuous random variable, and the pmf of Y as

pY (v) = FY (v) − FY (v − 1) (3.2.3)


3.2 Functions of Random Variables and Their Distributions 175

when Y is a discrete random variable. The result (3.2.3) is for a random variable whose
range is a subset of integers as described after Definition 3.1.4: more generally, we
can write it as

pY (vi ) = FY (vi ) − FY (vi−1 ) (3.2.4)

when A = {v1 , v2 , . . .} instead of A = {0, 1, . . .}.

Example 3.2.1 Obtain the cdf FY of Y = a X + b in terms of the cdf FX of X , where


a = 0.

Solution We have the cdf FY (y) = P(Y ≤ y) = P(a X + b ≤ y) as


⎧ 
⎨P X ≤ y−b
, a > 0,
a 
FY (y) =
⎩P X ≥ y−b
, a<0
a
⎧ 
⎨ FX y−b , a > 0,
=
a   (3.2.5)
⎩P X = y−b
+ 1 − FX y−b
,a<0
a a

by noting that the set {Y ≤ y} is equivalent to the set {a X + b ≤ y}. ♦

Example 3.2.2 When the random variable X has the cdf



0, x ≤ 0; x, 0 ≤ x ≤ 1;
FX (x) = (3.2.6)
1, x ≥ 1;

the cdf of Y = 2X + 1 is

0, y ≤ 1; y−1
, 1 ≤ y ≤ 3;
FY (y) = 2 (3.2.7)
1, y ≥ 3;

which are shown in Fig. 3.7. ♦

FX (x) FY (y)
1 1

1
2

0 1 x 0 1 2 3 y

Fig. 3.7 The cdf FX (x) of X and cdf FY (y) of Y = 2X + 1


176 3 Random Variables

Example 3.2.3 For a continuous random variable X with cdf FX , obtain the cdf of
Y = X1 .
Solution We get

FY (y) = P (X (y X − 1) ≥ 0)
⎧ 

⎨ P X ≤ 0 or X ≥ y ,
⎪ y > 0,
1

= P (X ≤ 0) ,  y = 0,


⎩P 1 ≤ X ≤ 0 , y<0
y
⎧ 

⎨ FX (0) + 1 − FX y ,
⎪ y > 0,
1

= FX (0),  y = 0, (3.2.8)


⎩ FX (0) − FX 1 , y < 0,
y

 
by noting that 1
X
≤ y = X ≤ y X 2 = {(y X − 1)X ≥ 0}. ♦
Example 3.2.4 Obtain the cdf of Y = a X 2 in terms of the cdf FX of X when a > 0.


Solution Because the set {Y ≤ y} is equivalent
 2 to the
 set a X 2 ≤ y , the cdf of Y
can be obtained as FY (y) = P(Y ≤ y) = P X ≤ ay , i.e.,

FY (y) =
0,    y < 0, (3.2.9)
P − a ≤ X ≤ ay , y ≥ 0,
y

which can be rewritten as



FY (y) =
0,       y < 0,
y
FX a
− F X − y
a
+ P X = − y
a
, y≥0
         
y y y
= FX − FX − +P X =− u(y) (3.2.10)
a a a

in terms of the cdf FX of X . In (3.2.10), it is assumed u(0) = 1. ♦


Example 3.2.5 Based onthe result of Example 3.2.4, obtain the cdf of Y = X 2
when FX (x) = 1 − 23 e−x u(x).

Solution For convenience, let α = ln 23 . Then, eα = 23 . Recollecting that P(X =


   
0) = FX 0+ − FX 0− = 13 − 0 = 13 in (3.2.10), we have FY (x) = 0 when
x < 0 and FY (0) = {1 − exp(α)} − {1 − exp(α)} + P(X = 0) = 13 when x = 0.
√   √  √ 
When √x > 0, recollecting
  that
√ FX x = 1√ − exp −  x +√α u x = 1−
exp − x +α√ and FX − √ x = 1 − exp √ x + α u − x = 0, we get
FY (x) = FX x − FX − x = 1 − exp − x + α . In summary,
3.2 Functions of Random Variables and Their Distributions 177

FX (x) FY (x)
1 1

1 1
3 3

0 x 0 x
  √ 
Fig. 3.8 The cdf FX (x) = 1 − 23 e−x u(x) and cdf FY (x) = 1 − 23 e− x u(x) of Y = X 2

 
2 −√x
FY (x) = 1 − e u(x), (3.2.11)
3

which is shown in Fig. 3.8 together with FX (x). ♦



Example 3.2.6 Express the cdf FY of Y = X in terms of the cdf FX of X when
P(X < 0) = 0.

Solution We have FY (y) =
0,  y < 0, i.e.,
P X ≤ y 2 , y ≥ 0,
 
FY (y) = FX y 2 u(y) (3.2.12)

√ 
from FY (y) = P(Y ≤ y) = P X≤y . ♦

Example 3.2.7 Recollecting that the probability for a singleton set is 0 for a con-
tinuous random variable X , the cdf of Y = |X | can be obtained as FY (y) = P(Y ≤
y) = P(|X | ≤ y), i.e.,

0, y < 0,
FY (y) =
P(−y ≤ X ≤ y), y ≥ 0

0, y < 0,
=
FX (y) − FX (−y) + P(X = −y), y ≥ 0
= {FX (y) − FX (−y)} u(y) (3.2.13)

in terms of the cdf FX of X . Examples of the cdf FX (x) and FY (y) are shown in
Fig. 3.9. ♦

Example 3.2.8 When the cdf of the input X to the limiter



⎨ b, x ≥ b,
g(x) = x, −b ≤ x ≤ b, (3.2.14)

−b, x < −b
178 3 Random Variables

FX (x) FY (y)
1 1

0 x 0 y

Fig. 3.9 The cdf FX (x) of X and the cdf FY (y) of Y = |X |

is FX , obtain the cdf FY of the output Y = g(X ).

Solution First, when y < −b and y ≥ b, we have FY (y) = 0 and FY (y) = FY (b) =
1, respectively. Next, when −b ≤ y < b, we have FY (y) = FX (y) from FY (y) =
P(Y ≤ y) = P(X ≤ y). Thus, we eventually have

⎨ 1, y ≥ b,
FY (y) = FX (y), −b ≤ y < b, (3.2.15)

0, y < −b,

which is continuous from the right-hand side at any point y and discontinuous at
y = ±b in general. ♦

Example 3.2.9 Obtain the cdf of Y = g(X ) when X ∼ U (−1, 1) and


⎧1
⎨ 2 , x ≥ 21 ,
g(x) = x, − 21 ≤ x < 21 , (3.2.16)
⎩ 1
− 2 , x < − 21 .

Solution The cdf of Y = g(X ) can be obtained as



⎨ 1, y ≥ 21 ,
FY (y) = 1
(y + 1), − 21 ≤ y < 21 , (3.2.17)
⎩2
0, y < − 21

using (3.2.15), which is shown in Fig. 3.10. ♦

3.2.2 Probability Density Function

Let us first introduce the following theorem which is quite useful in dealing with the
differentiation of an integrated bi-variate function:
3.2 Functions of Random Variables and Their Distributions 179

FX (x) g(x) FY (y)


1
1 2 1
3
− 12 4
1
2
0 1 x 1
2
4
− 12
−1 0 1 x − 12 0 1 y
2

Fig. 3.10 The cdf FX (x), limiter g(x), and cdf FY (y) of Y = g(X ) when X ∼ U (−1, 1)

Theorem 3.2.1 Assume that a(x) and b(x) are integrable functions and that both

g(t, x) and ∂x g(t, x) are continuous in x and t. Then, we have
 b(x)
d db(x) da(x)
g(t, x)dt = g(b(x), x) − g(a(x), x)
dx a(x) dx dx
 b(x)
∂g(t, x)
+ dt, (3.2.18)
a(x) ∂x

which is called the Leibnitz’s rule.

Example 3.2.10 Assume a(x) = x, b(x) = x 2 , and g(t, x) = 2t + x. Then,



 b(x)  x2
∂x a(x)
g(t, x)dt = 4x 3 + 3x 2 − 4x from x (2t + x)dt = x + x − 2x .
4 3 2
 b(x) ∂g(t,x)
On the other hand, a(x) ∂x dt = x 2 − x from ∂g(t,x) = 1. Therefore,
 b(x) ∂g(t,x)  2∂x 
g(b(x), x) d x − g(a(x), x) d x + a(x) ∂x dt = 2x 2x + x − 3x + x 2 −
db(x) da(x)

x = 4x 3 + 3x 2 − 4x from g(b(x), x) = 2x 2 + x, g(a(x), x) = 3x, da(x) dx


= 1, and
db(x)
dx
= 2x. ♦

3.2.2.1 One-to-One Transformations

We attempt to obtain the cdf FY (y) = P(Y ≤ y) of Y = g(X ) when thepdf of X  is


f X . First, if g is differentiable and increasing, then the cdf FY (y) = FX g −1 (y) is
 g −1 (y)
FY (y) = f X (t)dt (3.2.19)
−∞


because {Y ≤ y} = X ≤ g −1 (y) , where g −1 is the inverse of g. Thus, the pdf of
  −1
Y = g(X ) is f Y (y) = dy
d
FY (y) = f X g −1 (y) dg dy(y) , i.e.,
180 3 Random Variables

dx
f Y (y) = f X (x) (3.2.20)
dy

with x = g −1 (y). Similarly, if g is differentiable and decreasing, then the cdf of Y


is
 ∞
FY (y) = f X (t)dt (3.2.21)
g −1 (y)

 
from FY (y) = P(Y ≤ y) = P X ≥ g −1 (y) , and the pdf is f Y (y) = d
dy
FY (y) =
  −1
− f X g −1 (y) dg dy(y) , i.e.,

dx
f Y (y) = − f X (x) . (3.2.22)
dy

Combining (3.2.20) and (3.2.22), we have the following theorem:

Theorem 3.2.2 When g is a differentiable and decreasing function or a differen-


tiable and increasing function, the pdf of Y = g(X ) is

f X (x) 
f Y (y) = , (3.2.23)
|g  (x)| x=g−1 (y)

where f X is the pdf of X .

The result (3.2.23) can be written as f Y (y) = |gX (g−1 (y) ) , as f Y (y) =
f g −1 (y)

   ( )|
 dx 
f X (x)  dy  −1
, or as
x=g (y)

f Y (y)|dy| = f X (x)|d x|. (3.2.24)

The formula (3.2.24) represents the conservation or invariance of probability: the


probability f X (x)|d x| of the region |d x| of the random variable X is the same as the
probability f Y (y)|dy| of the region |dy| of the random variable Y when the region
|dy| of Y is the image of the region |d x| of X under the function Y = g(X ).

Example 3.2.11 For a non-zero real number a, let Y = a X + b. Then, noting


that
   the inverse function of y = g(x) = ax + b is x = g −1 (y) = y−b and that
 a
g g −1 (y)  = |a|, we get
 
1 y−b
f a X +b (y) = fX (3.2.25)
|a| a
3.2 Functions of Random Variables and Their Distributions 181

fX (x) fY (y)
1
1
2

0 1 x 1 2 3 y

Fig. 3.11 The pdf f X (x) and pdf f Y (y) of Y = 2X + 1 when X ∼ U [0, 1)

from (3.2.23). This result is the same as f a X +b (y) = dy


d
Fa X +b (y), the derivative of
the cdf (3.2.5) obtained in Example 3.2.1. Figure 3.11 shows the pdf f X (x) and pdf
f Y (y) = 21 u(y + 1)u(3 − y) of Y = 2X + 1 when X ∼ U [0, 1). ♦

Example 3.2.12 Obtain the pdf of Y = cX when X ∼ G(α, β) and c > 0.

Solution Using (2.5.31) and (3.2.25), we get


 
1 1 y α−1 y y
f cX (y) = exp − u
c β α Γ (α) c cβ c
 
1 y
= y α−1 exp − u (y) . (3.2.26)
(cβ)α Γ (α) cβ

In other words, cX ∼ G(α, cβ) when X ∼ G(α, β) and c > 0. ♦

Example 3.2.13 Consider  Y = X1 . Because the inverse function of y = g(x) = 1


  x
is x = g −1 (y) = 1y and g  g −1 (y)  = y 2 , we get
 
1 1
f X1 (y) = 2 f X (3.2.27)
y y

from (3.2.23), which can also be obtained by differentiating (3.2.8). Figure 3.12
shows the pdf f X (x) and pdf f Y (y) of Y = X1 when X ∼ U [0, 1). ♦

Example 3.2.14 When X ∼ C(α), obtain the distribution of Y = 1


X
.
α
Solution Noting that f X (x) = 1
π x 2 +α2
, we get f Y (y) = 1 1
απ y 2 + 12
from (3.2.27). In
  α
other words, if X ∼ C(α), then X ∼ C α1 .
1


Example 3.2.15 Express the pdf f Y of Y = X in terms of the pdf f X of X .

Solution When y < 0, there√ is no solution to y = x, and thus f Y (y) = 0. When

y > 0, the solution to y = x is x = y and g (x) = 2√1 x . Therefore,
2
182 3 Random Variables

fX (x) fY (y)

1 1

0 1 x 0 1 y

Fig. 3.12 The pdf f X (x) and pdf f Y (y) of Y = 1


X when X ∼ U [0, 1)

 
f √ X (y) = 2y f X y 2 u(y), (3.2.28)
 
which is the same as f √ X (y) = 2y f X y 2 u(y) + FX (0)δ(y), obtainable by dif-
 
ferentiating F√ X (y) = FX y 2 u(y) shown in (3.2.12), except at y = 0. Note

that, for X to be meaningful, we should have P(X < 0) = 0. Thus, when
X is a continuous random variable, we have FX (0) = P(X ≤ 0)  ∞= P(X = 0) =
0 and, consequently, FX (0)δ(y) = 0. We then easily obtain3 −∞ f √ X (y)dy =
∞  2 ∞ ∞
0 2y f X y dy = 0 f X (t)dt = −∞ f X (t)dt = 1 because f X (x) = 0 for x < 0
from P(X < 0) = 0. ♦

Example 3.2.16 Obtain the pdf of Y = X when the pdf X is

⎨ x, 0 ≤ x < 1,
f X (x) = 2 − x, 1 ≤ x < 2, (3.2.29)

0, x < 0 or x ≥ 2.

Solution Noting that


⎧ 2
 2 ⎨y , 0 ≤ y 2 < 1,
f X y = 2 − y , 1 ≤ y 2 < 2,
2
(3.2.30)

0, y 2 < 0 or y 2 ≥ 2,

we get
⎧ 3
⎨ 2y ,  0≤y<√ 1,
f Y (y) = 2y 2 − y 2 , 1 ≤ y < 2, √ (3.2.31)

0, y < 0 or y ≥ 2

from (3.2.28), which is shown in Fig. 3.13. ♦

∞ ∞   ∞
3We can equivalently obtain −∞ f √ X (y)dy = 0 2y f X y 2 dy + FX (0) = 0 f X (t)dt +
0 ∞ 0
−∞ f X (t)dt = −∞ f X (t)dt = 1 using FX (0) = −∞ f X (t)dt.
3.2 Functions of Random Variables and Their Distributions 183

fX (x) fY (y)
1 2

0 x 0 √ y
1 2 1 2

Fig. 3.13 The pdf f Y (y) of Y = X for the pdf f X (x) of X

fX (x) fY (y)

2
exp σ
2

2πσ 2

0 x 0 2 y
e−σ
 
Fig. 3.14 The pdf f X (x) of X ∼ N 0, σ 2 and pdf of the log-normal random variable Y = e X

Example 3.2.17 Express the pdf f Y of Y = e X in terms of the pdf f X of X .


Solution When y ≤ 0, there is no solution to y = e x , and thus f Y (y) = 0. When
y > 0, the solution to y = e x is x = ln y and g  (x) = e x . We therefore get

1
f e X (y) = f X (ln y)u(y), (3.2.32)
y

assuming u(0) = 0. ♦
 
Example 3.2.18 When X ∼ N m, σ 2 , obtain the distribution of Y = e X .
 
exp − (x−m)
2
Solution Noting that f X (x) = √2πσ
1
2σ 2 , we get

 
1 (ln y − m)2
f Y (y) = √ exp − u(y), (3.2.33)
2πσ 2 y 2σ 2

 is called the log-normal pdf. Figure 3.14 shows the pdf f X (x) of X ∼
which
N 0, σ 2 and the pdf (3.2.33) of the log-normal random variable Y = e X . ♦

3.2.2.2 General Transformations

We have discussed the probability functions of Y = g(X ) in terms of those of X when


the transformation g is a one-to-one correspondence, via (3.2.23) in previous section.
184 3 Random Variables

We now extend our discussion into the more general case where the transformation
y = g(x) has multiple solutions.
Theorem 3.2.3 When the solutions to y = g(x) are x1 , x2 , . . ., that is, when y =
g (x1 ) = g (x2 ) = · · · , the pdf of Y = g(X ) is obtained as

∞
f X (xi )
f Y (y) = , (3.2.34)
i=1
|g  (xi )|

where f X is the pdf of X .

We now consider some examples for the application of the result (3.2.34).
Example 3.2.19 Obtain the pdf of Y = a X 2 for a > 0 in terms of the pdf f X of X .
Solution If y < 0, then the solution to y = ax 2does not exist. Thus,
 f Y (y) = 0. If
2 y y
y > 0, then the solutions to y = ax are x1 = a and x2 = − a . Thus, we have
     
g (x1 ) = g  (x2 ) = 2a y from g  (x) = 2ax and, subsequently,
a

     
1 y y
f aX2 (y) = √ fX + fX − u(y), (3.2.35)
2 ay a a

which is, as expected, the same as the result obtainable by differentiating the cdf
(3.2.10) of Y = a X 2 . ♦
Example 3.2.20 When X ∼ N (0, 1), we can easily obtain the pdf  f Y (y) =
4
  2
√ 1 exp − y u(y) of Y = X 2 by noting that f X (x) = √1 exp − x . ♦
2π y 2 2π 2

Example 3.2.21 Express the pdf f Y of Y = |X | in terms of the pdf f X of X .


Solution When y < 0, there is no solution to y = |x|, and thus
 f Y(y) = 0. When
y > 0, the solutions to y = |x| are x1 = y and x2 = −y, and g  (x) = 1. Thus, we
get

f Y (y) = { f X (y) + f X (−y)} u(y), (3.2.36)

which is the same as


d
f Y (y) = [{FX (y) − FX (−y)} u(y)]
dy
= { f X (y) + f X (−y)} u(y) (3.2.37)

4This pdf is called the central chi-square pdf with the degree of freedom of 1. The central chi-square
pdf, together with the non-central chi-square pdf, is discussed in Sect. 5.4.2.
3.2 Functions of Random Variables and Their Distributions 185

obtained by differentiating the cdf FY (y) in (3.2.13), and then, noting that {FX (y)
−FX (−y)} δ(y) = {FX (0) − FX (0)} δ(y) = 0. ♦

Example 3.2.22 When X ∼ U [−π, π), obtain the pdf and cdf of Y = a sin(X + θ),
where a > 0 and θ are constants.

Solution First, we have f Y (y) = 0 for |y| > a. When |y| < a, letting the two
solutions to y = g(x) = a sin(x + θ) in the interval [−π,  π) of x be x1 and x2 ,
we have f X (x1 ) = f X (x2 ) = 2π
1
. Thus, recollecting that g  (x) = |a cos(x + θ)| =

a 2 − y 2 , we get

1
f Y (y) =  u(a − |y|) (3.2.38)
π a − y2
2

from (3.2.34). Next, let us obtain the cdf FY (y). When 0 ≤ y ≤ a, letting α =
sin−1 ay and 0 ≤ α < π2 , we have x1 = α − θ and x2 = π − α − θ and, conse-
quently, FY (y) = P(Y ≤ y) = P (−π ≤ X ≤ x1 ) + P (x2 ≤ X < π). Now, from
P (−π ≤ X ≤ x1 ) = 2π 1
(x1 + π) and P (x2 ≤ X < π) = 2π 1
(π − x2 ), we have
FY (y) = 2π (2π + 2α − π), i.e.,
1

1 1 y
FY (y) = + sin−1 . (3.2.39)
2 π a

When −a ≤ y ≤ 0, letting β = sin−1 ay and − π2 ≤ β < 0, we have x1 = β − θ,


x2 = −π − β − θ, and x1 − x2 = π + 2β, and thus the cdf is FY (y) = P(Y ≤ y) =
P (x2 ≤ X ≤ x1 ) = 2π
1
(π + 2β), i.e.,

1 1 y
FY (y) = + sin−1 . (3.2.40)
2 π a

Combining FY (y) = 0 for y ≤ −a, FY (y) = 1 for y ≥ a, (3.2.39), and (3.2.40), we


get5

⎨ 0, y ≤ −a,
FY (y) = 1
+ 1
π
sin−1 ay , |y| ≤ a, (3.2.41)
⎩2
1, y ≥ a.

The cdf (3.2.41) can of coursebe obtained from the pdf (3.2.38) by integra-
y
tion: specifically, from FY (y) = −∞ πu(a−|t|)

a 2 −t 2
dt, we get FY (y) = 0 when y ≤ −a,
 sin −1 y  
FY (y) = π1 − π a a cos 1
θ
a cos θdθ = π1 sin−1 ay + π2 = 21 + π1 sin−1 ay when −a ≤
2
y ≤ a, and FY (y) = 1
2
+ 1
π
sin−1 1 = 1 when y ≥ a. Figure 3.15 shows the pdf

5 If a < 0, a will be replaced with |a|.


186 3 Random Variables

fY (y)
FY (y)
1
fX (x) 1
2π 1
2
1

−π 0 π x 0 2 y
−2 −2 0 2 y

Fig. 3.15 The pdf f X (x), pdf f Y (y) of Y = 2 sin(X + θ), and cdf FY (y) of Y when X ∼ U [−π, π)

f X (x), pdf f Y (y), and cdf FY (y) when a = 2. Exercise 3.4 discusses a slightly
more general problem. ♦

Example 3.2.23 For a continuous random variable X with cdf FX , obtain the cdf,
pdf, and pmf of Z = sgn(X ), where

sgn(x) = u(x) − u(−x)


= 2u(x) − 1

⎨ 1, x > 0,
= 0, x = 0, (3.2.42)

−1, x < 0

is called the sign function. First, we have the cdf FZ (z) = P(Z ≤ z) = P(sgn(X ) ≤
z) as

⎨ 0, z < −1,
FZ (z) = P(X ≤ 0), −1 ≤ z < 1,

1, z≥1
= FX (0)u(z + 1) + {1 − FX (0)}u(z − 1), (3.2.43)

and thus the pdf f Z (z) = d


dz
FZ (z) of Z is

f Z (z) = FX (0)δ(z + 1) + {1 − FX (0)}δ(z − 1). (3.2.44)

In addition, we also have



⎨ FX (0), z = −1,
p Z (z) = 1 − FX (0), z = 1, (3.2.45)

0, otherwise

as the pmf of Z . ♦
3.2 Functions of Random Variables and Their Distributions 187

3.2.2.3 Finding Transformations

We have so far discussed obtaining the probability functions of Y = g(X ) when


the probability functions of X and g are given. We now briefly consider the inverse
problem of finding g when the cdf’s FX and FY are given or, equivalently, finding
the function that transforms X with cdf FX into Y with cdf FY .
The problem can be solved by making use of the uniform distribution as the
intermediate step: i.e.,

FX (x) → uniform distribution → FY (y). (3.2.46)

Specifically, assume that the cdf FX and the inverse FY−1 of the cdf FY are continuous
and increasing. Letting

Z = FX (X ), (3.2.47)

we have X = FX−1 (Z ) and {FX (X ) ≤ z} = X ≤ FX−1 (z) because FX is continuous
 of Z is FZ (z) = P(Z ≤ z) = P (FX (X ) ≤ z) =
6
and
 increasing. Therefore,
 the cdf
P X ≤ FX−1 (z) = FX FX−1 (z) = z for 0 ≤ z < 1. In other words, we have

Z ∼ U [0, 1). (3.2.48)

Next, consider

V = FY−1 (Z ). (3.2.49)
 
Then, recollecting (3.2.48), we get the cdf P(V ≤ y) = P FY−1 (Z ) ≤ y =
P (Z ≤ FY (y)) = FZ (FY (y)) = FY (y) of V because FZ (x) = x for x ∈ (0, 1). In
other words, when X ∼ FX , we have V = FY−1 (Z ) = FY−1 (FX (X )) ∼ FY , which is
summarized as the following theorem:
Theorem 3.2.4 The function that transforms a random variable X with cdf FX into
a random variable with cdf FY is g = FY−1 ◦ FX .
Figure 3.16 illustrates some of the interesting results such as

X ∼ FX → FX (X ) ∼ U [0, 1), (3.2.50)


Z ∼ U [0, 1) → FY−1 (Z ) ∼ FY , (3.2.51)

and

X ∼ FX → FY−1 (FX (X )) ∼ FY . (3.2.52)


6 Here, because FX is a continuous function, FX FX−1 (z) = z as it is discussed in (3.A.26).
188 3 Random Variables

X FX (·) Z = FX (X) FY−1 (·) V = FY−1 (FX (X))

X ∼ FX −→ Z = FX (X) ∼ U [0, 1) −→ V = FY−1 (FX (X)) ∼ FY

X FY−1 (FX (·)) V = FY−1 (FX (X))

X ∼ FX −→ V = FY−1 (FX (X)) ∼ FY

Fig. 3.16 Transformation of a random variable X with cdf FX into Y with cdf FY

Theorem 3.2.4 can be used in the generation of random numbers, for instance.
Example 3.2.24 From X ∼ U [0, 1), obtain the Rayleigh random variable Y ∼
y2
f Y (y) = αy2 exp − 2α 2 u(y).
 
y2
Solution Because the cdf of Y is FY (y) = 1 − exp − 2α 2 u(y), the func-
−1

tion we are looking for is g(x) = FY (x) = −2α2 ln(1 − x)as we can easily
see from (3.2.51). In other words, ifX ∼ U [0, 1), then Y = −2α2 ln(1 − X )
2
y
has the cdf FY (y) = 1 − exp − 2α 2 u(y). Note that, we conversely have V =

Y2
1 − exp − 2α2 ∼ U (0, 1). ♦

Example 3.2.25 For X ∼ U (0, 1), consider the desired pmf

pY (yn ) = P (Y = yn )

pn , n = 1, 2, . . . ,
= (3.2.53)
0, otherwise.

Then, letting p0 = 0, the integer Y satisfying

Y −1
 
Y
pk < X ≤ pk (3.2.54)
k=0 k=0

is the random variable with the desired pmf (3.2.53). ♦

3.3 Expected Values and Moments

The probabilistic characteristics of a random variable can be most completely


described by the distribution via the cdf, pdf, or pmf of the random variable. On
the other hand, the distribution is not available in some cases, and we may wish to
summarize the characteristics as a few numbers in other cases.
3.3 Expected Values and Moments 189

In this section, we attempt to introduce some of the key notions for use in such
cases. Among the widely employed representative values, also called central values,
for describing the probabilistic characteristics of a random variable and a distribution
are the mean, median, and mode (Beckenbach and Bellam 1965; Bickel and Doksum
1977; Feller 1970; Hajek 1969; McDonough and Whalen 1995).

Definition 3.3.1 (mode) For a random variable X with pdf f X or pmf p X , if

f X (xmod ) ≥ f X (x) for X a continuous random variable, or (3.3.1)


p X (xmod ) ≥ p X (x) for X a discrete random variable (3.3.2)

holds true for all real number x, then the value xmod is called the mode of X .

The mode is the value that could happen most frequently among all the values of a
random variable. In other words, the mode is the most probable value or, equivalently,
the value at which the pmf or pdf of a random variable is maximum.

Definition 3.3.2 (median) The value α satisfying both P(X ≤ α) ≥ 1


2
and P(X ≥
α) ≥ 21 is called the median of the random variable X .

Roughly speaking, the median is the value at which the cumulative probability
is 0.5. When the distribution is symmetric, the point of symmetry of the cdf is the
median. The median is one of the quantiles of order p, or 100 p percentile, defined as
the number ξ p satisfying P(X ≤ ξ p ) ≥ p and P(X ≥ ξ p ) ≥ 1 − p for 0 < p < 1.
For a random variable X with cdf FX , we have
   
p ≤ FX ξ p ≤ p + P X = ξ p . (3.3.3)
 
Therefore, if P X = ξ p = 0 as for a continuous random variable, the solution to
FX (x) = p is ξ p : the solution to this equation is unique when the cdf FX is a strictly
increasing function, but otherwise, there exist many solutions, each of which is the
quantile of order p.
The median and mode are not unique in some cases. When there exist many
medians, the middle value is regarded as the median in some cases.

Example 3.3.1 For the pmf p X (1) = 13 , p X (2) = 21 , and p X (3) = 16 , because
P(X ≤ 2) = 13 + 21 = 56 ≥ 21 and P(X ≥ 2) = 21 + 16 = 23 ≥ 21 , the median7 is 2.
For the uniform distribution over the set {1, 2, 3, 4}, any real number in the interval8
[2, 3] is the median, and the mode is 1, 2, 3, or 4. ♦

7 Note that if the median xmed is defined by P(X ≤ xmed ) = P (X ≥ xmed ), we do not have the
median in this pmf.
med is defined by P(X ≤ x med ) = P (X ≥ x med ), any real number in
8 Note that if the median x

the interval (2, 3) is the median.


190 3 Random Variables

Example 3.3.2 For the distribution N (1, 1), the mode is 1. When the pmf is p X (1)
= 13 , p X (2) = 21 , and p X (3) = 16 , the mode of X is 2. ♦

3.3.1 Expected Values

We now introduce the most widely used representative value, the expected value.

Definition
 ∞3.3.3 (expected value) For a random variable X with cdf FX , the value
E{X } = −∞ x d FX (x), i.e.,
⎧∞
⎨ −∞ x f X (x)d x, X continuous random variable,
E{X } = ∞
(3.3.4)
⎩ x p X (x), X discrete random variable
x=−∞

∞
is called the expected value or mean of X if −∞ |x| d FX (x) < ∞.

The expected value is also called the stochastic average, statistical average, or ensem-
ble average, and E{X } is also written as E(X ) or E[X ].
b x
Example 3.3.3 For X ∼ U [a, b), we have the expected value E{X } = a b−a dx =
b2 −a 2
= a+b
2(b−a) 2
of X . The mode of X is any real number between a and b, and the
median is the same as the mean a+b
2
. ♦

Example 3.3.4 (Stoyanov 2013) For unimodal random variables, the median usually
lies between the mode and mean: an example of exception is shown here. Assume
the pdf

⎨ 0, x ≤ 0,
f (x) = x, 0 < x ≤ c, (3.3.5)
⎩ −λ(x−c)
ce , x >c

2
of X with c ≥ 1 and c2 + λc = 1. Then, the mean, median, and mode of X are
3 2
μ = c3 + cλ + λc2 , 1, and c, respectively. If we choose c > 1 sufficiently close to 1,
then λ ≈ 2 and μ ≈ 12 13
, and the median is smaller than the mean and mode although
f (x) is unimodal. ♦

Theorem 3.3.1 (Stoyanov 2013) A necessary condition for the mean E{X } to exist
for a random variable X with cdf F is lim x{1 − F(x)} = 0.
x→∞
∞ x 
Proof Rewrite x{1 − F(x)} as x{1 − F(x)} = x f (t)dt − f (t)dt =
∞ −∞ −∞
x x f (t)dt. Now, letting E{X } = m, we have
3.3 Expected Values and Moments 191
 x  ∞
m= t f (t)dt + t f (t)dt
−∞ x
 x  ∞
≥ t f (t)dt + x f (t)dt (3.3.6)
−∞ x

∞ ∞ ∞
for x > 0 because x t f (t)dt ≥ x f (t)dt. Here, we should have lim x x
x
x ∞ x→∞
f (t)dt → 0 for (3.3.6) to hold true because lim −∞ t f (t)dt = −∞ t f (t)dt = m
x→∞
when x → ∞. ♠

Based on the result (3.E.2) shown in Exercise 3.1, we can show that (Rohatgi and
Saleh 2001)
 ∞  0
E{X } = P(X > x)d x − P(X ≤ x)d x (3.3.7)
0 −∞

for any continuous random variable X , dictating that a necessary and sufficient condi-
∞ 0
tion for E{|X |} < ∞ is that both 0 P(X > x)d x and −∞ P(X ≤ x)d x converge.

3.3.2 Expected Values of Functions of Random Variables

Based on the discussions in the previous section, let us now consider the expected
values of functions of a random variable. Let FY be the cdf of Y = g(X ). Then, the
expected value of Y = g(X ) can be expressed as
 ∞
E{Y } = y d FY (y). (3.3.8)
−∞

In essence, the expected value of Y = g(X ) can be evaluated using (3.3.8) after we
have obtained the cdf, pdf, or pmf of Y from that of X . On the other
 ∞ hand, the expected
value of Y = g(X ) can be evaluated as E{Y } = E{g(X )} = −∞ g(x)d FX (x), i.e.,
⎧∞

⎪ g(x) f X (x)d x, continuous random variable,
⎨ −∞
E{Y } = 
∞ (3.3.9)


⎩ g(x) p X (x), discrete random variable.
x=−∞

While the first approach (3.3.8) of evaluating the expected value of Y = g(X ) requires
that we need to first obtain the cdf, pdf, or pmf of Y from that of X , the second
approach (3.3.9) does not require the cdf, pdf, or pmf of Y . In the second approach,
we simply multiply the pdf f X (x) or pmf p X (x) of X with g(x) and then integrate
or sum without first having to obtain the cdf, pdf, or pmf of Y . In short, if there is
192 3 Random Variables

no other reason to obtain the cdf, pdf, or pmf of Y = g(X ), the second approach is
faster in the evaluation of the expected value of Y = g(X ).
Example 3.3.5 When X ∼ U [0, 1), obtain the expected value of Y = X 2 .
Solution (Method 1) Based
 on (3.2.35), we can obtain the pdf f Y (y) =
√   √
√1
2 y
f X y + f X − y u(y) = 1

2 y
{u(y) − u(y − 1)} of Y . Next, using
1 y 1√
(3.3.8), we get E{Y } = 0 2√ y dy = 2 0 ydy = 13 .
1
1
(Method 2) Using (3.3.9), we can directly obtain E{Y } = 0 x 2 d x = 13 . ♦
From the definition of the expected value, we can deduce the following properties:
(1) When a random variable X is non-negative, i.e., when P(X ≥ 0) = 1, we have
E{X } ≥ 0.
(2) The expected value of a constant is the constant. In other words, if P(X = c) = 1,
then E{X } = c. n 

(3) The expected value is a linear operator: that is, we have E ai gi (X ) =
i=1

n
ai E{gi (X )}.
i=1
(4) For any function h, we have |E{h(X )}| ≤ E{|h(X )|}.
(5) If h 1 (x) ≤ h 2 (x) for every point x, then we have E {h 1 (X )} ≤ E {h 2 (X )}.
(6) For any function h, we have min(h(X )) ≤ E{h(X )} ≤ max(h(X )).
Example 3.3.6 Based on (3) above, we have E{a X + b} = aE{X } + b when a and
b are constants. ♦
Example 3.3.7 For a continuous random variable X ∼ U (1, 9) and h(x) = √1 ,
x
compare h(E{X }), E{h(X )}, min(h(X )), and max(h(X )).
Solution We have h(E{X }) = h(5) = √15 from the result in Example 3.3.3
∞ 9
and E{h(X )} = −∞ h(x) f X (x)d x = 1 8√1 x d x = 21 from (3.3.9). In addition,
min(h(X )) = √19 = 13 and max(h(X )) = √11 = 1. Therefore, 1
3
< 1
2
< 1, i.e.,
min(h(X )) ≤ E{h(X )} ≤ max(h(X )), confirming (6). ♦

3.3.3 Moments and Variance

Definition 3.3.4 (moment) For a random variable X with cdf FX , we call m n =


E {X n }, i.e.,
 ∞
mn = x n d FX (x) (3.3.10)
−∞

∞
the n-th moment of X if E {|X |n } = −∞ |x|n d FX (x) < ∞ for n = 0, 1, . . ..
3.3 Expected Values and Moments 193

In other words, the expected value of a power of a random variable is called the
moment, and the moment is one of the expected values of a function of a random
variable. The n-th moment of X can specifically be written as
⎧∞ n

⎪ x f X (x)d x, continuous random variable,
⎨ −∞
mn = 
∞ (3.3.11)


⎩ x n p X (x), discrete random variable.
x=−∞

Definition 3.3.5 (central moment) The expected value μn = E {(X − E{X })n }, i.e.,
⎧∞

⎪ (x − m 1 )n f X (x)d x, continuous random variable,
⎨ −∞
μn = 
∞ (3.3.12)


⎩ (x − m 1 )n p X (x), discrete random variable
x=−∞

is called the n-th central moment of X for n = 0, 1, . . ..

From Definitions 3.3.4 and 3.3.5, it is easy to see that m 0 = μ0 = 1, m 1 = E{X },


and μ1 = 0. More generally, we have


n
μn = n Ck m k (−m 1 )n−k (3.3.13)
k=0

 
 
n
from E (X − m 1 ) n
=E n Ck X
k
(−m 1 ) n−k
, and conversely,
k=0


n
mn = n Ck μk m 1
n−k
(3.3.14)
k=0

 

n
from m n = E [{(X − m 1 ) + m 1 } ] = E n
n Ck (X − m 1 ) k
m n−k
1 between the
k=0
moments {m n }∞
n=0 and the central moments {μn }∞
n=0 .
Often, we also consider the
absolute moment E {|X |n }.
Some of the moments and functions of moments are used more frequently than
the others in representing the probabilistic properties of a random variable. One such
important parameter is the variance.

Definition 3.3.6 (variance; standard deviation) The second central moment μ2 is


called the variance, and the non-negative square root of the variance is called the
standard deviation.
194 3 Random Variables

The variance of X is often denoted by σ 2X , Var{X }, or V {X }, and can also be


expressed as

σ 2X = E X 2 − E2 {X }
= m 2 − m 21 (3.3.15)
 
from E (X − E{X })2 = E X 2 − 2X E{X } + E2 {X } = E X 2 − E2 {X }. We
also have Var{a X } = a 2 Var{X } for any real number a.

Example 3.3.8 Assume the pdf f X (r ) = b−a


1
{u(r − a) − u(r − b)} for a uniform
random variable X . Then, we have the mean

a+b
E{X } = (3.3.16)
2

and variance
1
Var{X } = (b − a)2 (3.3.17)
12
b x2
 a+b 2
from Var{X } = a b−a dx − 2
. ♦

Example 3.3.9 For the exponential random variable X with pdf f (r ) = λe−λr u(r ),

the mean E{X } = λ 0 r e−λr dr is

1
E{X } = (3.3.18)
λ

and the variance is


1
Var{X } = (3.3.19)
λ2
∞  1 2
from Var{X } = λ 0 r 2 e−λr dr − λ
. ♦
 
Example 3.3.10 Obtain the mean and variance for X ∼ N m, σ 2 .
∞ √
Solution Letting √x−m2 = t, we have the expected value E{X } = −∞ √2σt+m
2σ 
 2   m 2πσ
2
 2 √ ∞ √  2 ∞ √
exp −t 2σdt = √π −∞ 2σt exp −t dt + m −∞ exp −t dt = √π π,
1

i.e.,

E{X } = m, (3.3.20)
3.3 Expected Values and Moments 195

 ∞ √ √
and the second moment E X 2 = √ 1 2 −∞ 2σ 2 t 2 + 2 2mσt + m 2 2σ
 2 ∞  2  2πσ √ 
exp −t dt = π −∞ 2σ t exp −t dt + m π , i.e.,
√1 2 2 2

 1  √ √ 
E X 2 = √ σ2 π + m 2 π
π
= σ2 + m 2 . (3.3.21)

Consequently, Var{X } = E X 2 − m 2 = σ 2 . In (3.3.20) and (3.3.21), we have used
⎧√
 ∞ ⎨ π, k = 0,
 
t k exp −t 2 dt = 0, k = 1, (3.3.22)
−∞ ⎩ √π
2
, k = 2.

The first and third results in (3.3.22) can be shown easily by recollecting that
the integration of the standard normal pdf over the entire real line is 1, i.e.,
2
√1 exp − x d x = 1 with integration by parts. The second result is based on that
∞2π  22
 
0 t exp −t dt < ∞ and that t exp −t 2 is an odd function. ♦

Example 3.3.11 We have the mean

E{X } = np (3.3.23)

and variance

σ 2X = np(1 − p) (3.3.24)

for the binomial random variable X ∼ b(n, p). ♦


∞ −λ k 

e−λ λk
Example 3.3.12 We have the mean E{X } = k e k!λ = λ k!
, i.e.,
k=1 k=0

E{X } = λ, (3.3.25)

second moment E X 2 = λ2 + λ, and variance

σ 2X = λ (3.3.26)

for the Poisson random variable X ∼ P(λ). ♦


196 3 Random Variables

Example 3.3.13 Consider the Cauchy random variable with pdf f (r ) = απ r 2 +α


1
2.

Then, because the absolute moments are E{|X |} = ∞ and E |X | = ∞, the mean
2

and variance do not exist. Similarly, for a random variable with pmf
 6
, k = 1, 2, . . . ,
p(k) = π2 k 2 (3.3.27)
0, otherwise,

the mean and variance do not exist. ♦


    √
Example 3.3.14 Consider X ∼ N 0, σ 2 . Recollecting Γ 21 = π shown in
∞    ∞  
(1.4.83), we have −∞ exp −αx 2 d x = 2 0 e−t 2√dtαt = √1α Γ 21 , i.e.,

 ∞

  π
exp −αx 2 d x = , (3.3.28)
−∞ α

∞ √  
which can also be obtained from −∞ √απ exp −αx 2 d x = 1. Differentiating
(3.3.28) k times with respect to α using (3.2.18), we get
 ∞

  (2k − 1)!! π
x 2k exp −αx 2 d x = (3.3.29)
−∞ 2 k αk α
∞
for k = 1, 2, . . ., which can also be obtained as −∞ x 2k exp(−αx 2 )d x =
 ∞  k    
2 0 αt e−t 2√dtαt = αk 1√α Γ k + 21 = (2k−1)!!
√ Γ 1 , where
2k αk α 2

(2k − 1)!! = (2k − 1)(2k − 3) × · · · × 3 × 1 (3.3.30)


 
for k a natural number. Based on the symmetry of the pdf f (x) of N 0, σ 2 and
(3.3.29), we get

 0, n is odd,
E X n
= (3.3.31)
(n − 1)!! σ n , n is even.

We specifically have

E X 4 = 3σ 4 (3.3.32)

and

E X 6 = 15σ 6 . (3.3.33)

In addition, when n is an even number, E {|X |n } = E {X n }.√ When n is an


∞ ∞
odd number 2k + 1, we have E {|X |n } = −∞ |x|n f X (x)d x = √ 2 2 0 x 2k+1 exp
πσ
3.3 Expected Values and Moments 197
   2 n ∞ ∞
x2
σ because 0 x n e−a x d x = 0
n−1 2 2
− 2σ 2 d x = 2k k! π2 σ 2k+1 = 2 2 Γ n+12 π
n
t 2 −t dt√
∞ n 1
an
e 2a t = 2a1n+1 0 t 2 − 2 e−t dt, i.e.,
 ∞  
n −a 2 x 2 1 n+1
x e d x = n+1 Γ . (3.3.34)
0 2a 2

In summary, we have
 
 2k k! π2 σ 2k+1 , n = 2k + 1, k = 0, 1, . . . ,
E |X | n
= (3.3.35)
(n − 1)!!σ n , n is even,

with which

2
E {|X |} = σ (3.3.36)
π

and

 8 3
E |X | 3
= σ (3.3.37)
π

can be confirmed. ♦

Example 3.3.15 (Romano and Siegel 1986; Stoyanov 2013) When the distribution
is symmetric, i.e., when the cdf F satisfies F(−x) = 1 − F(x), all the odd-ordered
moments are zero. On the other hand, the converse does not necessarily hold true.
For example, consider the pdf

1 1
 1

f γ (x) = exp −x 4 1 − γ sin x 4 u(x) (3.3.38)
24

with γ ∈ [−1, 1]. Using that


 
n+1
(−1)k+1 n! x n−k+1
x e sin bx d x = e
n ax ax
k sin(bx + kt) (3.3.39)
k=1 (n − k + 1)!(a 2 + b2 ) 2

 ∞ 
with t = sin−1 − √a 2b+b2 , we can show that 0 x k exp −x 4 sin x 4 d x = 0 for
1 1

 ∞ xk 1

k = 0, 1, . . .. Thus, for any value of γ, the k-th moment is 0 24 exp −x 4
∞
d x = 16 0 v 4k+3 e−v dv = 16 Γ (4k + 4). Now, the pdf
198 3 Random Variables

fγ (x)

0 x
1
 1

Fig. 3.17 The pdf f γ (x) = exp −x 4 1 − γ sin x 4 u(x): when γ = β, although f (x) =
1
24

1
2 f γ (x)u(x) + f β (−x)u(−x) is not symmetric, all odd ordered moments are 0

1
f (x),
2 γ
x ≥ 0,
f (x) = (3.3.40)
1
f (−x),
2 β
x <0

for γ = β is not an even function, and the (2n + 1)-st moment of f (x) is
 ∞  ∞  0 
1
x 2n+1
f (x) d x = x 2n+1
f γ (x) d x + x 2n+1
f β (−x) d x
−∞ 2 0 −∞
 ∞  ∞ 
1
= x 2n+1
f γ (x) d x − x 2n+1
f β (x) d x
2 0 0
=0 (3.3.41)

because the moment of f γ (x) is equal to that of f β (x) at the same order. Specifically,
when γ = 1 and β = −1, all the odd-ordered moments are 0 and the even-ordered
moments are m 2n = 16 (8n + 3)!. The pdf (3.3.38) is shown in Fig. 3.17. ♦

3.3.4 Characteristic and Moment Generating Functions

We have discussed so far how we can obtain the moments based on the cdf, pdf, or
pmf. On the other hand, we can easily obtain the moments by using the Laplace or
Fourier transform as we can solve, for example, differential equations more easily
by using the Laplacetransform.
The set √12π e jωx of complex orthonormal basis functions has the property
! "ω∈R   ∞ j (ω−ν)x
∞ jωx − jνx
of √12π e jωx , √12π e jνx = 2π
1
−∞ e e d x = 2π
1
−∞ e d x, i.e.,
# $
1 jωx 1 jνx
√ e ,√ e = δ(ω − ν), (3.3.42)
2π 2π
3.3 Expected Values and Moments 199

which can be shown easily from, for example, (1.E.11), where j = −1. The
Fourier transform H (ω) = F {h(x)} of h(x) and the inverse Fourier transform
h(x) = F−1 {H (ω)} of H (ω) can be expressed as
 ∞
H (ω) = h(x)e− jωx d x (3.3.43)
−∞

and
 ∞
1
h(x) = H (ω)e jωx dω, (3.3.44)
2π −∞

 
respectively, based on the set √1 e jωx of the orthonormal basis functions.
2π ω∈R

3.3.4.1 Characteristic Functions

Definition 3.3.7 (characteristic function) The function


⎧∞

⎪ f X (x)e jωx d x, continuous random variable,
⎨ −∞
ϕ X (ω) = 
∞ (3.3.45)


⎩ p X (x)e jωx , discrete random variable,
x=−∞


which is the expected value ϕ X (ω) = E e jω X , is called the characteristic function
(cf) of X .

If we let pk = p X (xk ) = p X (k) = P(X = xk ) = P(X = k) for an integer k, then


the cf of the discrete random variable X can be expressed as


ϕ X (ω) = pk e jkω (3.3.46)
k=−∞

because we can put xk = k in discrete random variables as discussed following


Definition 3.1.4.

Theorem 3.3.2 If the cdf’s of two random variables are the same, then their cf’s
are the same, and vice versa.

In other words, the cf is also a function with which we can characterize the
probabilistic properties of a random variable.

Example 3.3.16 For a geometric random variable with pmf p(k) = (1 − α)k α for
k ∈ {0, 1, . . .}, the cf is
200 3 Random Variables

α
ϕ(ω) = . (3.3.47)
1 − (1 − α)e jω

If the pmf is p(k) = (1 − α)k−1 α for k ∈ {1, 2, . . .} for a geometric random variable,
αe jω
then the cf is ϕ(ω) = 1−(1−α)e jω . For the NB distribution with pmf (2.5.14), the cf is

αr
ϕ(ω) = r (3.3.48)
1 − (1 − α)e jω

 r
αe jω
while the cf is ϕ(ω) = 1−(1−α)e jω
if the NB distribution has pmf (2.5.17). ♦

∞ 1  
Example 3.3.17 We have the cf ϕ(ω) = −∞ √2πσ exp − 2σ1 2 x 2 − 2mx + m 2
  ∞    
+ jωx d x = exp − σ 2ω + jmω √2πσ 2 2
2 2
1
−∞ exp − 1
2σ 2 x − m + jωσ d x,
i.e.,
 
σ2 ω2
ϕ(ω) = exp − + jmω (3.3.49)
2

for the normal random variable with mean m and variance σ 2 . ♦

3.3.4.2 Properties of Characteristic Functions

The cf ϕ(ω) has the following properties:


(1) The cf ϕ(ω) has its maximum magnitude of 1 at ω = 0. In other words, |ϕ(ω)|
≤ ϕ(0) = 1.
(2) The cf ϕ(ω) is uniformly continuous at every real number ω.
(3) The cf ϕ(ω) is positive semi-definite. In other words,


n 
n
zl z k∗ ϕ (ωl − ωk ) ≥ 0, (3.3.50)
l=1 k=1

where {ωk }nk=1 are real numbers and {z k }nk=1 are complex numbers.

Proof

(1) We easily have ϕ(0) = E{1}


 = 1. Assuming the pdf f when the cf is ϕ(ω), we
have |ϕ(ω)| ≤ −∞ e jωx  f (x)d x = 1.

3.3 Expected Values and Moments 201

(2) Consider a real number μ0 such that 0 < μ0 < 2ε for a positive real number
ε. Assuming a periodic function b̄(y) with period π and b̄(y) = |y| + μ0 for
|y| ≤ π2 , we have

|sin y| ≤ b̄(y). (3.3.51)


 ∞
Next, for a random variable X with pdf f X and cf ϕ(ω), let E b̄(X ) = −∞ b̄(x)
∞  nπ+ π

f X (x)d x = μ0 + nπ− π |x − nπ| f X (x)d x = μ̄. Then, we have 0 < μ0 ≤
2
2
n=−∞
π
μ̄ ≤ 2
+ μ0 < ∞ and

∞ 
 nπ+ π2

E b̄ (ν X ) = μ0 + |ν (x − nπ)| f X (x)d x
π
n=−∞ nπ− 2
= |ν| (μ̄ − μ0 ) + μ0 (3.3.52)

for a constant ν. In addition, using (3.3.51), we get


 
 jαX   (α − β)X 
e − e jβ X  = 2 sin 
2 
 
α−β
≤ 2b̄ X (3.3.53)
2

  1
from e jαX − e jβ X  = {cos (αX ) − cos (β X )}2 + {sin (αX ) − sin (β X )}2 2

= 2 − 2 cos {(α − β)X }. If we let δ = ε−2μ 0
, then 0 < δ < ∞. Therefore,
μ̄  
from (3.3.52) and (3.3.53), we get |ϕ(α) − ϕ(β)| = E e jαX − e jβ X  ≤
 jαX    
E e − e jβ X  ≤ |α − β| μ̄ − μ0 + 2μ0 < ε−2μ μ̄
0
(μ̄ − μ0 ) + 2μ0 , i.e.,
 ε−2μ0
(μ̄ − μ0 ) + 2μ0 , if μ̄ − μ0 = 0,
|ϕ(α) − ϕ(β)| < μ̄−μ0
2μ0 , if μ̄ − μ0 = 0
≤ε (3.3.54)

when |α − β| < δ. Thus, for any ε > 0, we have |ϕ(α) − ϕ(β)| < ε if |α − β| <
δ = ε−2μ
μ̄
0
, implying that ϕ(ω) is uniformly continuous at every real number ω.
(3) For a random variable with cf ϕ(ω) and pdf f , we have
⎧ 2 ⎫

n 
n ⎨ n  ⎬
 
zl z k∗ ϕ (ωl − ωk ) = E  zl e jωl X 
⎩  ⎭
l=1 k=1 l=1

≥0 (3.3.55)
202 3 Random Variables


n 
n 
n 
n ∞ ∞
because zl z k∗ ϕ (ωl − ωk ) = zl z k∗ −∞ e j(ωl −ωk )x f (x)d x = −∞
l=1 k=1 l=1 k=1

n n
zl e jωl x z k∗ e− jωk x f (x)d x. ♠
l=1 k=1

Theorem 3.3.3 The cf of Y = a X + b is

ϕY (ω) = e jbω ϕ X (aω) (3.3.56)

when ϕ X is the cf of X and a, b ∈ R.



Example 3.3.18 We have obtained the cf ϕ X (ω) = exp − σ 2ω + jmω of X ∼
2 2

 
N m, σ 2 in Example 3.3.17. Based on this result and
 (3.3.56),
we canobtain the
 m  σ2 ω2 ω

cf of Y = σ (X − m) as ϕY (ω) = exp jω − σ exp − 2 σ2 + jm σ , i.e.,
1

 
ω2
ϕY (ω) = exp − . (3.3.57)
2
 
This result implies that if X ∼ N m, σ 2 , then σ1 (X − m) ∼ N (0, 1). ♦

3.3.4.3 Moment Generating Functions

Definition 3.3.8 (moment generating function) The function


 ∞
M X (t) = et x d FX (x), (3.3.58)
−∞


which is the expected value M X (t) = E et X , is called the moment generating func-
tion (mgf) of X .

Denoting the Laplace transform of the pdf f of X by M̃(t) = L( f ), the mgf of


X is M̃(−t). The mgf M(t) and the cf ϕ(ω) of a random variable is related as


ϕ(ω) = M(t) , (3.3.59)
t= jω

implying that the cf and mgf are basically the same in the sense that, by taking
the inverse transform of the cf or mgf, we can obtain the cdf. Normally, the cf is
guaranteed its convergence whereas the convergence region of the mgf should be
considered for the inverse transform, and for some distributions the mgf does not
exist.
Based on the discussion in Sect. 3.2 and Definitions
  ∞ 3.3.7 and 3.3.8, the cf of
Y = g(X ) can be obtained as ϕY (ω) = E e jωY = −∞ e jω y d FY (y), i.e.,
3.3 Expected Values and Moments 203

ϕY (ω) = E e jωg(X )
 ∞
= e jωg(x) d FX (x). (3.3.60)
−∞

We can subsequently obtain the mgf of Y = g(X ) as


 ∞
MY (t) = etg(x) d FX (x) (3.3.61)
−∞

by replacing jω with t in (3.3.60).

3.3.4.4 Characteristic and Cumulative Distribution Functions

It is easy to see that

ϕ(ω) = F { f (x)}ω→−ω (3.3.62)

from the definition of the cf. In other words, the cf is the complex conjugate of the
Fourier transform of the pdf. Hence, we can obtain the pdf from the cf as

f (x) = F−1 {ϕ(−ω)}


 ∞
1
= ϕ(ω)e− jωx dω (3.3.63)
2π −∞

from the property of the Fourier transform.


x
Now, let us express the cdf F(x) = −∞ f (t)dt as the convolution

F(x) = f (x) ∗ u(x) (3.3.64)

of the pdf f (x) and the unit step function u(x). The Fourier transform of the convo-
lution of two functions is the product of the Fourier transforms of the two functions.
Noting that the Fourier transform of the unit step function u(x) is

1
F {u(x)} = πδ(ω) + (3.3.65)

 the Fourier transform of the cdf F(x) is F {F(x)} =


as discussed in Exercise 1.26,
F { f (x)} F {u(x)} = ϕ(−ω) πδ(ω) + jω 1
, i.e.,

ϕ(−ω)
F {F(x)} = πϕ(0)δ(ω) + . (3.3.66)

204 3 Random Variables

∞  ϕ(−ω)

Inverse transforming (3.3.66), the cdf F(x) = 1
2π −∞ πϕ(0)δ(ω) + jω
exp ( jωx) dω can be expressed as (Papoulis 1962)
 ∞
ϕ(0) 1 ϕ(−ω)
F(x) = + exp ( jωx) dω
2 2π j −∞ ω
 ∞
ϕ(0) j ϕ(ω)
= + exp (− jωx) dω. (3.3.67)
2 2π −∞ ω

in terms of the cf ϕ(ω).

3.3.4.5 Cumulants
 


Expanding the natural logarithm ψ(ω) = ln ϕ(ω) = ln 1 + ( jω)s ms!s of the cf
s=1
ϕ(ω) in the power series of jω near ω = 0, we get
∞ ( ∞ (2 ∞ (3
 ms 1  1 
s ms s ms
ψ(ω) = ( jω)s
− ( jω) + ( jω) + ···
s=1
s! 2 s=1 s! 3 s=1 s!
jω   ( jω)2   ( jω)3
= m1 + m 2 − m 21 + m 3 − 3m 1 m 2 + 2m 31 + ···
1! 2! 3!
∞
( jω)n
= kn , (3.3.68)
n=1
n!

based on which the cumulant is defined as follows:

Definition 3.3.9 (cumulant) The parameter kn in (3.3.68) can be expressed as



∂n 
kn = ψ(ω) (3.3.69)
∂( jω) n
ω=0

and is called the n-th cumulant.

Example 3.3.19 The first, second, and third cumulants are the same as the mean
k1 = m 1 , the variance k2 = m 2 − m 21 = σ 2 , and the third central moment k3 = m 3 −
3m 2 m 1 + 2m 31 = μ3 , respectively. In addition, the fourth cumulant is k4 = m 4 −
 2
4m 3 m 1 − 3m 22 + 12m 2 m 21 − 6m 41 = μ4 − 3 m 2 − m 21 = μ4 − 3σ 4 . ♦

Definition 3.3.10 (coefficient of variation; skewness; kurtosis) Let the mean, vari-
ance, n-th central moment, and n-th cumulant be m, σ 2 , μn , and kn , respectively.
Then, v1 = mσ , v2 = σμ33 = √k3 3 , and v3 = σμ44 = 3 + kk42 are called the coefficient of
k2 2

variation, skewness, and kurtosis, respectively.


3.3 Expected Values and Moments 205

Fig. 3.18 The skewness v2 f (x)


and symmetry of pdf
v2 = 0

v2 > 0 v2 < 0

In characterizing the probabilistic properties of a random variable, we can first


consider the expected value, and then the variance. The coefficient of variation,
skewness, and kurtosis can then be considered in the characterization. These three
parameters represent deviations of a distribution from the normal distribution. The
coefficient of variation is a measure of dispersion normalized by the mean. The
skewness represents the degree of asymmetry: specifically, the distribution is sym-
metric about the mean when v2 = 0, the mean is greater than the mode and median
(called right-skewed or positively-skewed) when v2 > 0, and the mean is smaller
than the mode and median (called left-skewed or negatively-skewed) when v2 < 0.
Figure 3.18 shows an example. Skewness is frequently used along with kurtosis to
better judge the likelihood of events falling in the tails of a probability distribution.

Example 3.3.20 The skewness of the geometric distribution with parameter α is


v2 = √2−α
1−α
, and that of NB(r, α) is v2 = √(1−α)r
2−α
. ♦

Example 3.3.21 Assume the pdf f X (x) = λ exp (−λx) u(x) of X . Then, as
we have observed in Example 3.3.9, we have E{X } = m = λ1 , μ2 = Var{X } =
∞ 3     2
σ 2 = λ12 , and μ3 = 0 x − λ1 λe−λx d x = − λ63 + λ62 x − λ1 + λ3 x − λ1
 3  −λx ∞
+ x − λ1 e = λ23 . Therefore, the coefficient of variation is v1 = mσ =
 1   1 −1 0
   −1
λ λ
= 1 and the skewness is v2 = σμ33 = λ23 λ13 = 2. ♦

The kurtosis v3 represents how sharp the peak is when compared to the normal
distribution: when v3 = 3, the sharpness of the peak of the distribution is the same
as that of the normal distribution, when v3 < 3, the distribution is less sharp than the
normal distribution, called platykurtic or mild peak, and when v3 > 3, the distribution
is sharper than the normal distribution, called leptokurtic or sharp peak.

Example 3.3.22 When the pdf is f X (x) = λ exp (−λx) u(x) for X , σ = λ1 as
∞ 4
we have observed in Example 3.3.21. In addition, μ4 = 0 x − λ1 λe−λx d x =
 4  3  2    ∞
− x − λ1 + λ4 x − λ1 + λ122 x − λ1 + λ243 x − λ1 + λ244 e−λx , i.e.,
0
206 3 Random Variables

9
μ4 = . (3.3.70)
λ4

μ4  9   1 −1
Thus, the kurtosis is v3 = σ4
= λ4 λ4
= 9. ♦

3.3.5 Moment Theorem

When we obtain the moments such as the mean and variance, we need to evaluate
one integral for each of the moments. While the number of integration is the same as
the number of moments that we want to obtain based on the definition of moments,
we can first obtain the cf or mgf by one integration and then obtain the moments by
differentiation if we use the moment theorem: note that differentiation is easier in
general to evaluate than integration.
Theorem 3.3.4 The k-th moment of X can be obtained as

∂k 
mk = j −k
ϕ X (ω)
∂ω k
ω=0
= j −k ϕ(k)
X (0) (3.3.71)

or

m k = M X(k) (0) (3.3.72)

if the cf ϕ X (ω) or the mgf M X (t) of X is differentiable k times at 0.

 ∞ ∞ ∂k

jωx 
Proof First, if we evaluate E X k = −∞ x k f X (x)d x = −∞ f X (x) j1k ∂ω k e
∞   ω=0
−k ∂ k  −k ∂ k jωx 
d x = j ∂ωk −∞ f X (x)e d x ω=0 recollecting j ∂ωk e 
jωx
= x , we get
k
ω=0

 ∂k 
E X k
= j −k
ϕ (ω)  . (3.3.73)
∂ω k
X 
ω=0


Similarly, by differentiating the mgf M X (t) k times, we get M X(k) (t) = E X k et X
and, subsequently, the desired result. ♠
Theorem 3.3.4 is referred to as the moment theorem.
  
Example 3.3.23 For X ∼ N m, σ 2 , we have ϕ X (ω) = exp − ω 2σ + jmω as
2 2


observed in Example 3.3.17. Thus, E{X } = j −1 ϕX (0) = m, E X 2 = j −2 ϕX (0) =
m 2 + σ 2 , and Var{X } = σ 2 . ♦
3.3 Expected Values and Moments 207

 ∞ variable X with pdf f X (x)


Example 3.3.24 For a random
λ
= λ exp (−λx) u(x),
we have the mgf M X (t) = 0 λ exp (−λx) exp(t x)d x = λ−t for9 t < λ. Thus, we
 
λ  
get M X (0) = (λ−t)2 = λ1 and M X (0) = 2(λ−t)λ
(λ−t)4 
= λ22 . Based on these two
t=0 t=0
results, we obtain E{X } = λ1 and Var{X } = λ12 , which are the same as those obtained
directly from the definition of moments in Example 3.3.9. ♦

In evaluating the moments of discrete random variables via the moment theorem, it
is often convenient to let z = e jω and s = et when using the cf and mgf, respectively.

n
Example 3.3.25 For K ∼ b(n, p), the cf is ϕ K (ω) = n Ck p
k
(1 − p)n−k e jkω ,
k=0
i.e.,
n
ϕ K (ω) = pe jw + (1 − p) . (3.3.74)


Now, letting e jω = z and writing the cf as γ K (z) = ϕ K (ω) , we get
e jω =z

γ K (z) = ( pz + 1 − p)n . (3.3.75)

We then have

γ K(i) (1) = E{K (K − 1) · · · (K − i + 1)} (3.3.76)

∂i ∂i
 
because ∂z i γ K (z) = ∂z i E z
K
= E K (K − 1) · · · (K − i + 1)z K −i . Therefore,

γ K (1) = 1, γ K (1) = E{K } = np, and γ K (1) = E K 2 − E{K } = n(n − 1) p 2 .
From these results, we have E{K } = np and Var{K } = np(1 − p). ♦

∞ k k
λ z
Example 3.3.26 Consider K ∼ P(λ). Then, γ K (z) = e−λ k!
= eλ(z−1) and
  k=0
ϕ K (ω) = exp λ e jω − 1 from P(K = k) = e−λ λk! for k = 0, 1, 2, . . .. In other
k


words, γ K (1) = E{K } = λ and γ K (1) = E K 2 − E{K } = λ2 . Therefore, E{K } =
 ∞

s k λk! e−λ ,
k
λ and Var{K } = λ. Meanwhile, the mgf of K is G K (s) = M K (t) t =
s=e k=0
i.e.,

G K (s) = eλ(s−1) , (3.3.77)

with which we can also obtain the mean, variance, . . ., etc. ♦

9Unless stated otherwise, an appropriate region of convergence is assumed when we consider the
mgf.
208 3 Random Variables

The moment theorem also implies the following: similar to how a function can
be expressed in terms of the coefficients of the Taylor series or Fourier series, the
moments or central moments are the coefficients with which the pdf can be expressed.
Specifically, if we express the mgf M X (t) of a random variable X in a series expan-
sion, we have

 E {X n }
M X (t) = tn. (3.3.78)
n=0
n!

 n ∞
}
Now, when the coefficients E{X
n!
of two distributions are the same, i.e., when
n=0
the moments are the same, the two distributions will be the same. Based on this
observation, by comparing the first few coefficients such as the mean and second
moment, we can investigate how similar a distribution is to another.

3.4 Conditional Distributions

In this section, we discuss the conditional distribution (Park et al. 2017; Sveshnikov
1968) of a random variable under conditions given in the form of an event.

3.4.1 Conditional Probability Functions

Definition 3.4.1 (conditional pmf) When the occurrence of an event A with


P(A) > 0 is assumed, the function p X |A (x) = P(X = x|A), i.e.,

P(X = x, A)
p X |A (x) = (3.4.1)
P(A)

is called the conditional pmf of the discrete random variable X .

Example 3.4.1 Let X be the face number from rolling a fair die. When we know
that the number is an odd number, the pmf of X is
1
, x = 1, 3, 5,
p X |A (x) = 3 (3.4.2)
0, otherwise

because P(A) = 1
2
for the event A = {the number is an odd number}. ♦

Definition 3.4.2 (conditional cdf; conditional pdf) When the occurrence of an event
A with P(A) > 0 is assumed, the function
3.4 Conditional Distributions 209

FX |A (x) = P(X ≤ x|A)


P(X ≤ x, A)
= (3.4.3)
P(A)

is called the conditional cdf of X , and the function

d
f X |A (x) = FX |A (x) (3.4.4)
dx

is called the conditional pdf of X .

Here, FX |A (x|A) or FX (x|A) is sometimes used to denote FX |A (x). Similarly,


f X |A (x) is also written as f X |A (x|A) or f X (x|A). Because the conditional cdf is a
cdf, we have FX |A (∞) = 1, FX |A (−∞) = 0, and FX |A (x1 ) ≤ FX |A (x2 ) for x1 < x2 .
Recollecting P(A|B) = P(B|A)P(A)P(B)
shown in (2.4.1) with the conditioning event
P( x1 <X ≤x2 |A)
B = {x1 < X ≤ x2 }, we have P ( A |x1 < X ≤ x2 ) = P(x1 <X ≤x2 )
P(A), i.e.,

FX |A (x2 ) − FX |A (x1 )
P (A |x1 < X ≤ x2 ) = P(A). (3.4.5)
FX (x2 ) − FX (x1 )

Let x1 = x and x2 = x + Δx in (3.4.5). Then, we get lim P(A|B) = P(A|X =


Δx→0
FX |A (x+Δx)−FX |A (x)
x) = lim P(A), i.e.,
Δx→0 FX (x+Δx)−FX (x)

f X |A (x)
P(A|X = x) = P(A), (3.4.6)
f X (x)

FX |A (x+Δx)−FX |A (x) FX |A (x+Δx)−FX |A (x)


because lim FX (x+Δx)−FX (x)
can be written as lim Δx
Δx→0 Δx→0
Δx
FX (x+Δx)−FX (x)
. The result (3.4.6) can be expressed as

f X |A (x)P(A) = P(A|X = x) f X (x). (3.4.7)


∞
By integrating (3.4.7) and noting that −∞ f X |A (x)d x = 1, we get the following
theorem:

Theorem 3.4.1 We have


 ∞
P(A) = P(A|X = x) f X (x)d x, (3.4.8)
−∞

which is called the total probability theorem for continuous random variables. Sim-
ilarly,
210 3 Random Variables



P(A) = P(A|X = x) p X (x) (3.4.9)
x=−∞

is called the total probability theorem for discrete random variables.

Example 3.4.2 Consider a rod with thickness 0. Cut the rod into two parts. Choose
one of the two parts at random and cut it into two. Find the probability PT 2 that the
three parts obtained in this way can make a triangle.

Solution Let the length of the rod be 1. As in Examples 2.3.6 and 2.3.7, let the
point of the first cutting be X . Then, the pdf of X is f X (v) = u(v)u(1 − v). Call the
interval [0, X ] the left piece and the interval [X, 1] the right piece on the real line.
When X = t, we get
 ∞
PT 2 = P ( triangle with the three pieces| X = t) f X (t) dt
−∞
 1
= P ( triangle with the three pieces| X = t) dt (3.4.10)
0

based on (3.4.8). We can make a triangle when the sum of the lengths of two pieces
is larger than the length of the third piece. When 0 < t < 21 , we should choose
 
the right piece and the second cutting should be placed10 in 21 , t + 21 . Thus, we
have P ( triangle with
 the three  pieces| X = t) = P(choose
 the right piece)P (the
second cutting is in 21 , t + 21  choose the right piece , i.e.,
 
1 length of 21 , t + 21
P ( triangle with the three pieces| X = t) =
2 length of the right piece
1 t
= . (3.4.11)
21−t

Similarly, when 21 < t < 1, we should choose the left piece and the second cutting
 
should be placed11 in t − 21 , 21 . Thus, P ( triangle with the three pieces| X = t) =
   
P(choose the left piece)P the second cutting is in t − 21 , 21  choose the left piece ,
i.e.,

11−t
P ( triangle with the three pieces| X = t) = . (3.4.12)
2 t

10 Denoting the location of the second cutting by y ∈ (t, 1), the lengths of the three pieces are t,
y − t, and 1 − y, resulting in the condition 21 < y < t + 21 of y to make a triangle.
11 Denoting the location of the second cutting by y ∈ (1, t), the lengths of the three pieces are y,

t − y, and 1 − y, resulting in the condition t − 21 < y < 21 of y to make a triangle.


3.4 Conditional Distributions 211

 1 1
Using (3.4.11) and (3.4.12) in (3.4.10), we get PT 2 = 2 1 t
0 2 1−t dt + 1
1 1−t
2 t
dt =
2
1
1
− 1
2
(t
+ ln |1 − t|) t=0 +
2 1
2
(−t
+ ln |t|) = ln 2 − ≈ 0.1931.
t= 21
1
2
Meanwhile, considering that a triangle cannot be made if the shorter piece is
chosen among the first two pieces, assume that we choose the longer of the two
pieces and then cut it into two. Then, we have the probability

2PT 2 = 2 ln 2 − 1
≈ 0.3863 (3.4.13)

of making a triangle, which is higher than the probability 41 obtained in Example


2.3.7. This is a consequence that we have used the information “We should choose
the longer piece among the first two pieces.” ♦
The following theorem similar to Theorem 3.4.1 can be obtained from

n
the total probability theorem P(A) = P ( A |Bi ) P (Bi ) shown in (2.4.13) with
i=1
A = {X ≤ x}:
Theorem 3.4.2 When the collection {Bi }i=1n
is a partition of the range of X , the
cdf, the pdf for a continuous random variable X , and the pmf for a discrete random
variable X can be expressed as


n
FX (x) = FX |Bi (x)P (Bi ) , (3.4.14)
i=1


n
f X (x) = f X |Bi (x)P (Bi ) , (3.4.15)
i=1

and


n
p X (x) = p X |Bi (x)P (Bi ) , (3.4.16)
i=1

respectively.

Example 3.4.3 Consider a communication system transmitting bits of 0 and 1, and


let H0 and H1 be the events that a bit of 0 and 1 is sent, respectively. Assume
P (H0 ) = 41 , P (H1 ) = 43 , and the conditional pdf’s of X are
 
2 3
f X |H0 (x) = u − |x| (3.4.17)
3 4
212 3 Random Variables

and
   
2 1 7
f X |H1 (x) = u x− u −x . (3.4.18)
3 4 4

Then, we have

1 3
f X (x) = f X |H0 (x) + f X |H1 (x)
4 1 4
, − 3
≤ x < ; , ≤ x < 43 ;
1 2 1
= 61 3 4 4 3 4 (3.4.19)
, ≤ x < 4 ; 0, otherwise
2 4
7

as the pdf of X . ♦
Theorem 3.4.3 Using (3.4.8) in (3.4.7), we get

P(A|X = x)
f X |A (x) = f X (x)
P(A)
P(A|X = x) f X (x)
= ∞ , (3.4.20)
−∞ P(A|X = x) f X (x)d x

another expression of the Bayes’ theorem.

By integrating the conditional pdf (3.4.20), the conditional cdf discussed in (3.4.3)
can be obtained as
 x
FX |A (x) = f X |A (t)dt
−∞
x
P(A|X = t) f X (t)dt
= −∞
∞ . (3.4.21)
−∞ P(A|X = t) f X (t)dt

From (3.4.3) and (3.4.21), we also get


 x
P(A, X ≤ x) = P(A|X = t) f X (t)dt. (3.4.22)
−∞

Example 3.4.4 Express the conditional cdf FX |X ≤a (x) and the conditional pdf
f X |X ≤a (x) in terms of the pdf f X and the cdf FX of a continuous random variable X .
Solution First, the conditional cdf FX |X ≤a (x) = P(X ≤ x|X ≤ a) can be written
as

P(X ≤ x, X ≤ a)
FX |X ≤a (x) = . (3.4.23)
P(X ≤ a)
3.4 Conditional Distributions 213

fX|X≤a (x)

fX (x)

0 a x

Fig. 3.19 The pdf f X (x) and conditional pdf f X |X ≤a (x)

Here, we have

P(X ≤ x)
FX |X ≤a (x) =
P(X ≤ a)
FX (x)
= (3.4.24)
FX (a)

when x ≤ a because P(X ≤ x, X ≤ a) = P(X ≤ x), and

FX |X ≤a (x) = 1 (3.4.25)

when x > a because P(X ≤ x, X ≤ a) = P(X ≤ a). From (3.4.24) and (3.4.25),
we finally have

f X (x)
f X |X ≤a (x) = u(a − x), (3.4.26)
FX (a)

which is shown in Fig. 3.19. ♦

Example 3.4.5 Let F be the cdf of the time X of a failure for a system: i.e., F(t) =
P(X ≤ t) is the probability that the system fails before time t, and 1 − F(t) = P(X >
t) is the probability that the system does not fail before time t. We also define the
conditional rate of failure β(t) via

β(t)dt = P(when the system does not fail before time t,


the system fails in the interval (t, t + dt)). (3.4.27)

Letting A = {X > t} in (3.4.3) and differentiating the result, we get the conditional
F  (x)
pdf f X |X >t (x) = 1−F(t) =  ∞ ff(x)
(x)d x
of X . Using this result, the conditional rate of
t
failure β(t) = f X |X >t (t) can be expressed as
214 3 Random Variables

F  (t)
β(t) = . (3.4.28)
1 − F(t)

Solving the differential equation (3.4.28) for F, we have


  x 
F(x) = 1 − exp − β(t)dt (3.4.29)
0

x
 =x 0 β(t)dt.
from − ln(1 − F(x))  Subsequently, by differentiating (3.4.29), we get
f (x) = β(x) exp − 0 β(t)dt . ♦

3.4.2 Expected Values Conditional on Event

Definition 3.4.3 (conditional expected value) The expectation


⎧ ∞

⎪ x f X |A (x)d x, continuous random variable,
⎨ −∞
E{X |A} = 
∞ (3.4.30)


⎩ x p X |A (x), discrete random variable
x=−∞

is the conditional expected value or conditional mean of X when the event A is


assumed.

Example 3.4.6 When the pdf of X is f X , obtain the conditional mean of X under
the assumption A = {X ≤ a}.

Solution Using
 ∞ the conditional pdf (3.4.26), we can obtain the conditional mean

E{X |A} = −∞ x f X |A (x)d x = −∞ x f X |X ≤a (x)d x as
 a
1
E{X |A} = x f X (x)d x (3.4.31)
FX (a) −∞

∞
from (3.4.30). Here, lim E{X |A} = 1
FX (∞) −∞ x f X (x)d x. Thus, (3.4.31) is in
a→∞
agreement with that the mean E{X } can be written as E{X |Ω}, i.e.,
 ∞
1
E{X } = x f X (x)d x (3.4.32)
FX (∞) −∞

because FX (∞) = 1. ♦

 ∞Definition 3.4.3 can be extended to the conditional expected value E {g(X )|A} =
−∞ g(x)d FX |A (x) of Y = g(X ) as
3.4 Conditional Distributions 215
⎧ ∞

⎪ g(x) f X |A (x)d x, continuous random variable,
⎨ −∞
E {g(X )|A} = 
∞ (3.4.33)


⎩ g(x) p X |A (x), discrete random variable
x=−∞

when the event A is assumed.

3.4.3 Evaluation of Expected Values via Conditioning

We have observed in Sect. 2.4 that the probability of an event can often be obtained
quite easily by first obtaining the conditional probability under an appropriate condi-
tion. We will now similarly see that obtaining the conditional expected value first will
be quite useful when we try to obtain the expected value. Evaluation of the expected
values via conditioning will be discussed again in Sect. 4.4.3.
Theorem 3.4.4 The expected value E{X } of X can be expressed as


n
E{X } = E {X |Ak } P (Ak ) , (3.4.34)
k=1

where the collection {A1 , A2 , . . . , An } is a partition of the range of X .

Proof We show the theorem for discrete random variables only. From (2.4.13),

∞ 
∞  n 
n 

we easily get E{X } = x p X (x) = x p X |Ak (x)P ( Ak ) = x
x=−∞ x=−∞ k=1 k=1 x=−∞

n
p X |Ak (x)P ( Ak ) = E {X |Ak } P ( Ak ). ♠
k=1

Example 3.4.7 There are a red balls and b green balls in a box. Pick one ball at
random from the box: if it is red, we put it back into the box; and if it is green, we
discard it and put one red ball from another source into the box. Let X n be the number
of red balls in the box and E {X n } = Mn be the expected value of X n after repeating
this experiment n times. Show that
 
1
Mn = 1− Mn−1 + 1 (3.4.35)
a+b

and, based on this result, confirm


 n
1
Mn = a + b − b 1 − . (3.4.36)
a+b

Obtain the probability Pn that the ball picked at the n-th trial is red.
216 3 Random Variables

Solution We have

Mn = E { X n | n-th ball is red} P(n-th ball is red)


+ E { X n | n-th ball is green} P(n-th ball is green), (3.4.37)

which can be rewritten as Mn = Mn−1 Ma+b
n−1
+ (Mn−1 + 1) 1 − Mn−1
=
  a+b
Mn−1 1 − a+b
1
+ 1 because

E { X n | n-th ball is red} = Mn−1 , (3.4.38)


Mn−1
P (n-th ball is red) = , (3.4.39)
a+b
E { X n | n-th ball is green} = Mn−1 + 1, (3.4.40)

and
Mn−1
P (n-th ball is green) = 1 − . (3.4.41)
a+b
n  
1 n
Letting μ = 1 − a+b 1
, we get Mn = aμn + 1−μ1−μ
= a + b − b 1 − a+b from
Mn = μMn−1 + 1 = μ(μMn−2 + 1) + 1 = μ2 Mn−2 + 1 + μ = · · · = μn M0 + 1 +
 
1 n−1
μ + · · · + μn−1 and M0 = a. We also have Pn = Ma+b
n−1
= 1 − a+b
b
1 − a+b for
n = 1, 2, . . .. ♦

3.5 Classes of Random Variables

In this section, we discuss four classes of widely-used random variables (Hahn and
Shapiro 1967; Johnson and Kotz 1970; Kassam 1988; Song et al. 2002; Thomas
1986; Zwillinger and Kokoska 1999) in detail. We start with the normal random
variables, followed by the binomial, Poisson, and exponential random variables. The
normal distributions are again discussed extensively in Chap. 5.

3.5.1 Normal Random Variables

Definition 3.5.1 (normal random variable) A random variable with the pdf
 
1 (x − m)2
f (x) = √ exp − (3.5.1)
2πσ 2 2σ 2
 
is called the normal random variable and its distribution is denoted by N m, σ 2 .
3.5 Classes of Random Variables 217

The normal distribution is the most important and widely-used distribution as


we will see from the central limit theorem, Theorem 6.2.12. We have already men-
tioned in Example 2.5.15 that the distribution N (0, 1) is called the standard normal
distribution.
Now, consider the standard normal pdf
 2
1 x
φ(x) = √ exp − (3.5.2)
2π 2

and its integral, the standard normal cdf


 x  2
1 t
Φ(x) = √ exp − dt. (3.5.3)
2π −∞ 2

Then, from the symmetry of φ(x), we get

Φ(−x) = 1 − Φ(x) (3.5.4)


 
as we observed in (3.1.28). In addition, the cdf of N m, σ 2 can be expressed as
 
x −m
F(x) = Φ . (3.5.5)
σ

 
We have already seen in (3.3.49) that the cf and mgf of N m, σ 2 are
 
σ2 ω2
ϕ(ω) = exp jmω − (3.5.6)
2

and
 
σ2 t 2
M(t) = exp mt + , (3.5.7)
2

respectively.
The tail probability of the normal distribution is used quite frequently in many
areas such as statistics, communications, and signal processing. Let us briefly discuss
an approximation of the tail probability
x  of the normal distribution. First, the error
function Θ(x) = erf(x) = √2π 0 exp −t 2 dt can be expressed as

√ 
erf(x) = 2Φ 2x − 1 (3.5.8)

in terms of the standard normal cdf Φ(x). For the tail integral
218 3 Random Variables
 ∞  2
1 t
Q(x) = √ exp − dt (3.5.9)
2π x 2

of the standard normal pdf, also called the complementary standard normal cdf, we
have
x φ(x)
φ(x) < Q(x) < (3.5.10)
1+x 2 x

for x > 0. Assume we approximate Q(x) as Q a (x) = (1−a)x+a 1 √


x 2 +b
φ(x) with

Q a (0) = Q(0) = 2 , in which case a 2πb = 2 should be satisfied. When we con-
1

sider only the case x > 0, it is known (Börjesson and Sundberg Mar. 1979) that
Q a (x) is the optimum upper bound on Q(x) when a ≈ 0.344 and b ≈ 5.334 and
optimum lower bound when a = π1 and b = 2π. In addition, when a ≈ 0.339 and
 
b ≈ 5.510, Q a (x) minimizes max  Q a (x)−Q(x)
Q(x) .

3.5.2 Binomial Random Variables

Definition 3.5.2 (binomial random variable) The number of occurrences of an event


in a repetition of n Bernoulli trials with distribution b(1, p) is a binomial random
variable with distribution b(n, p).

If we let K be the number of occurrences of a desired event A with probability


P(A) = p in n Bernoulli trials, then we have the pmf p K (k) = Pn (k) of K ∼ b(n, p),
where

Pn (k) = P(desired event occurs k times among n trials)


= n Ck p k q n−k (3.5.11)

for k = 0, 1, . . . , n, with q = 1 − p = P ( Ac ). We also have the cdf

x

F(x) = n Ck p
k n−k
q (3.5.12)
k=0

for x ≤ x < x + 1 of b(n, p), and the probability

y

P(x ≤ K ≤ y) = Pn (k) (3.5.13)
k=x
3.5 Classes of Random Variables 219

of the event {x ≤ K ≤ y} for K ∼ b(n, p). The pdf of b(n, p) can be written as

n
f (x) = k n−k
n Ck p q δ(x − k).
k=0

Example 3.5.1 Let X be the number of 2’s when we roll a fair die five times. Then,
   k  5 5−k
X ∼ b 5, 16 and P5 (k) = 5 Ck 16 6
for k = 0, 1, . . . , 5. ♦
Example 3.5.2 A fair die is rolled seven times. Let Y be the number of even
     k  1 7−k 
numbers. Then, Y ∼ b 7, 21 and P7 (k) = k7 21 2
= 128
1 7
k
for k = 0, 1,
. . . , 7. ♦
The sequence {Pn (k)}nk=0 increases until k = (n + 1) p − 1 and k = (n + 1) p
when (n + 1) p is an integer and until k = [(n + 1) p] when (n + 1) p is not an integer,
and then decreases.
 
Example 3.5.3 In b 3, 41 , Pn (k) is maximum at k = (n + 1) p − 1 = 0 and k = 1.
Specifically, P3 (0) = P3 (1) = 27
64
, P3 (2) = 64
9
, and P3 (3) = 64
1
. ♦
In evaluating Pn (k) = n Ck p k (1 − p)n−k , we need to calculate n Ck , which
becomes rather difficult when n is large and k is near n2 . We now discuss some
methods to alleviate this problem by considering the asymptotic approximations of
Pn (k) as n → ∞.
Definition 3.5.3 (small o) When

f (x)
lim = 0, (3.5.14)
x→∞ g(x)

the function f (x) is of lower order than g(x) for x → ∞, and is denoted by f = o(g).
Definition 3.5.3 implies that, when f = o(g) for x → ∞, f (x) increases slower
than g(x) as x → ∞.
Definition 3.5.4 (big O) Suppose that f (x) > 0 and g(x) > 0 for a sufficiently large
number x. When there exists a natural number M satisfying

f (x)
≤ M (3.5.15)
g(x)

for a sufficiently large number x, f (x) is said to be of, at most, the order of g(x) for
x → ∞, and is denoted by f = O(g).
From Definitions 3.5.3 and 3.5.4, when f (x) and g(x) are both positive for a
sufficiently large x, we have f = O(g) if f = o(g).
Example 3.5.4 We have ln x = o(x) for x → ∞ from lim lnxx = 0 for the two
 x→∞
functions f (x) = ln x and g(x) = x, and x 2 = o x 3 + 1 for x → ∞ from
2
lim x 3x+1 = 0. ♦
x→∞
220 3 Random Variables

Example 3.5.5 We have x + sin x = O(x) for x → ∞ because x+sin x


x
≤ 2 when x
is sufficiently large. We also have e + x = O(e ) for x → ∞ because e e+x
x 2
x 2 x
x →1
when x → ∞, and x = o(e x ) for x → ∞ because exx → 0 when x → ∞. ♦

 We have cos x = O(1)


Example 3.5.6 (Khuri 2003)   and sin x = O(|x|) for any
real number x, and x = O x 2 and x 2 + 2x = O x 2 for a large number x. ♦

Theorem 3.5.1 When npq  1, we have the approximation


 
1 (k − np)2
n Ck p
k
(1 − p) n−k
≈ √ exp − (3.5.16)
2πnpq 2npq
√   √ √ 
for k = np ± O npq , that is, for k ∈ np − a npq, np + a npq with some
a > 0.

Proof The theorem is proved in Appendix 3.2. ♠

Theorem 3.5.1 is called the de Moivre-Laplace theorem or the Gaussian approx-


imation of binomial distribution.
 
Example 3.5.7 Consider a random variable K ∼ b 10, 21 . Then, we have P(K =
 10
5) = P10 (5) = 10 C5 21 = 256
63
≈ 0.246. The Gaussian approximation produces
P(K = 5) ≈ √
0
e
≈ 0.252. ♦
2π×10× 2 × 2
1 1

Example
 3.5.8
 The distribution of even numbers from 1000 rollings of a fair die
is b 1000, 21 . Thus, the Gaussian approximation of the probability that we have
 −1
500 times of even numbers is P1000 (500) ≈ 2π × 1000 × 21 × 21 ≈ 0.0252.
√ −1 100
Similarly, P1000 (510) ≈ 500π e− 500 ≈ 0.0207 from (3.5.16). ♦


k2
Let us try to approximate P(k1 ≤ k ≤ k2 ) = n Ck p
k
(1 − p)n−k by making
k=k1
use of the steps in the proof of Theorem 3.5.1 shown in Appendix 3.2. First,

k2 
k2  
exp − (k−np)
2
when npq  1, we have n Ck p (1 − p)
k n−k
≈ √2πnpq
1
2npq
=
k=k1 k=k1

k2      
− (k−np)
2
√ 1
2πnpq
exp 2npq
k + 21 − k − 21 , i.e.,
k=k1


k2   
1 k2
(x − np)2
n Ck p (1 − p) ≈ √ exp −
k n−k
dx
k=k1
2πnpq k1 2npq
   
k2 − np k1 − np
= Φ √ −Φ √ (3.5.17)
npq npq
3.5 Classes of Random Variables 221
√ 
for k2 − k1 = O npq . The integral in (3.5.17) implies that the approximation
error will be small when |k1 − k2 |  1, but it could be large otherwise. To reduce
such an error, we often use the approximation
) * ) *

k2
k2 − np + 1
k1 − np − 1
n Ck p
k
(1 − p)n−k ≈ Φ √ 2
−Φ √ 2
, (3.5.18)
k=k1
npq npq

which is called the continuity correction and is considered also in (6.2.66), Example
6.2.26, and Exercise 6.10.

Example 3.5.9 In Example 3.5.8, the probability of 500, 501, or 502 times of
even numbers is P1000 (500) + P1000 (501)
 + P1000 (502)
 ≈ 0.0754. With the two
approximations above, we have Φ √2502
− Φ √250
0
≈ 0.0503 from (3.5.17) and
 
Φ √2.5
250
− Φ √−0.5
250
≈ 0.0754 from (3.5.18). ♦

Theorem 3.5.2 If np → λ when n → ∞ and p → 0, then

λk −λ
n Ck p
k
(1 − p)n−k → e (3.5.19)
k!

for k = O(np).

Proof A proof is given in Appendix 3.2. ♠

Theorem 3.5.2 is called the Poisson limit theorem or the Poisson approximation
of binomial distribution.
 
Example 3.5.10 For b 1000, 10−3 , we have P1000 (0) = 0.9991000 ≈ 0.3677, for
which the Poisson approximation provides exp(−np) = exp(−1) ≈ 0.3679. ♦

Example 3.5.11 In Example 3.5.7, we have observed


  that P(K = 5) = 0.246 with
its Gaussian approximation 0.252 for K ∼ b 10, 21 . From the Poisson approxima-
 
tion, we can get 55! e−5 ≈ 0.1755. In Example 3.5.10, when K ∼ b 1000, 10−3 ,
5

the Poisson approximation for P(K = 0) ≈ 0.3677  is 0.3679. With the Gaussian
1
approximation, we would get √1.998π exp − 1.998
1
≈ 0.2420 from the normal distri-
bution N (1, 0.999). ♦

As we can see from Examples 3.5.10 and 3.5.11, the Poisson approximation is
more accurate than the Gaussian approximation when p is close to 0, and vice versa.
222 3 Random Variables

3.5.3 Poisson Random Variables

Definition 3.5.5 (Poisson random variable) A random variable with the pmf

λk
pk = e−λ (3.5.20)
k!

for k = 0, 1, . . . is called the Poisson random variable. The distribution is denoted


by P(λ) and λ > 0 is called the Poisson rate or Poisson parameter.

The Poisson pmf { pk }∞


k=0 is a sequence which first increases and then decreases
or is a decreasing sequence: it is maximum at k = 0 when λ < 1; at k = λ and λ − 1
when λ ≥ 1 is an integer; and at k = [λ] when λ is a non-integer not smaller than 1.
We have the cdf
x
 λk
FK (x) = e−λ (3.5.21)
k=0
k!

of the Poisson random variable K ∼ P(λ).


Assume that we choose n points randomly in the interval − T2 , T
2
. Then,
 k  
ta ta n−k
P(k points exists in the interval (t1 , t2 )) = n Ck 1−
T T
= n Ck p k q n−k , (3.5.22)

where − T2 ≤ t1 < t2 ≤ T2 , p = t2 −tT


1
= tTa , and q = 1 − p. Now, even when an inter-
val Da of length ta does not overlap with another interval Db of length tb , the two
events A = {ka points in Da } and B = {kb points in Db } are not independent of each
other if the interval − T2 , T2 is finite because
 ka  kb  
n! ta tb ta tb n−ka −kb
P(AB) = 1− −
ka !kb !(n − ka − kb )! T T T T
= P(A)P(B). (3.5.23)

Now, we let λ = Tn , we have p = tTa → 0 and np = λta when n → ∞ and


T → ∞. Thus, recollecting the Poisson limit theorem (3.5.19), we
 n−ka −kb t
−n tTa + Tb
P (ka points in Da ) → e−λta (λtk!a ) ,
k
get 1 − tTa − tTb →e =
n ka +kb
exp {−λ (ta + tb )}, and n!
ka !kb !(n−ka −kb )!
→ ka !kb ! . Therefore, P(AB) =
e−λta (λtkaa)! e−λtb (λtkbb)! , i.e.,
ka kb

P(AB) = P(A)P(B) (3.5.24)


3.5 Classes of Random Variables 223

implying that the two events {ka points in Da } and {kb points in Db } are independent
of each other. The set of infinitely many points described above is called the random
Poisson points or Poisson points as defined below.
Definition 3.5.6 (random Poisson points) A collection of points satisfying the two
properties below is called random Poisson points, or simply Poisson points, with
parameter λ.
(1) P (k points in an inteval of length t) = e−λt (λt)
k

k!
.
(2) If two intervals Da and Db are non-overlapping, then the events {ka points in
Da } and {kb points in Db } are independent of each other.
The parameter λ in Definition 3.5.6 represents the average number of points in a
unit interval.
Example 3.5.12 Assume a set of Poisson points with parameter λ. Find the prob-
ability P(A|C) of A = {ka points in Da = (t1 , t2 )} when there are kc points in
Dc = (t1 , t3 ), where t1 ≤ t2 ≤ t3 .
Solution Let B = {kb points in Db } and C = {kc points in Dc }, where Db =
(t2 , t3 ) and kb = kc − ka . Then, because AC = {ka points in Da , kc points in Dc }
= {ka points in Da , kb points in Db } = AB, and Da and Db are non-overlapping,
we get P(A|C) = P(AC) P(C)
= P(AB)
P(C)
, i.e.,

P(A)P(B)
P(A|C) = . (3.5.25)
P(C)
  −1
We thus finally get P(A|C) = e−λta (λtkaa)! e−λtb (λtkbb)! e−λtc (λtkcc)!
ka kb kc
, i.e.,

 ka  kb
kc ! ta tb
P(A|C) = , (3.5.26)
ka !kb ! tc tc

where ta = t2 − t1 , tb = t3 − t2 , and tc = t3 − t1 . ♦
Example 3.5.13 Assume a set of Poisson points. Let X be the distance from a fixed
point t0 to the nearest point to the right-hand direction. Then, we have the cdf FX (x) =
P(X ≤ x) = P (at least one point exists in (t0 , t0 + x)) can be obtained as

FX (x) = 1 − P (no point in (t0 , t0 + x))


= 1 − e−λx (3.5.27)

for x ≥ 0. Thus, we have the pdf f X (x) = λe−λx u(x) and X is an exponential random
variable. ♦
Example 3.5.14 Consider a constant α and a set of Poisson points with parameter
λ. Then, for the number N of Poisson points in the interval (0, α), we have P(N =
k) = e−λα (λα)
k

k!
. In other words, N ∼ P(λα). ♦
224 3 Random Variables

3.5.4 Exponential Random Variables

Definition 3.5.7 (exponential random variable) A random variable with the pdf

f (x) = λe−λx u(x) (3.5.28)

is called an exponential random variable, where the parameter λ > 0 is called the
exponential rate or rate.

 ∞When X is an exponential random variable with rate λ > 0, the mgf M(t) =
−λx
0 e tx
λe d x is

λ
M(t) = , t < λ, (3.5.29)
λ−t

 easily obtain the expected value E{X } = λ , the second moment


1
with which we can
 
E X 2 = (λ−t)2λ
3 = λ22 , and the variance Var{X } = λ22 − λ12 = λ12 . In addition,
t=0
from the cdf
 
F(x) = 1 − e−λx u(x) (3.5.30)

of X , we get

P(X > s + t | X > t) = P(X > s) (3.5.31)

>s+t)
for s, t ≥ 0 because P(X > s + t | X > t) = P(X P(X >t)
= 1−F(s+t)
1−F(t)
= e−λs . The
property expressed by (3.5.31) is called the memoryless property of the exponential
distribution.

Example 3.5.15 Assume that the lifetime of an electric bulb follows an exponential
distribution. The result (3.5.31) implies that, if the electric bulb is on at some moment,
the distribution of the remaining lifetime of the bulb is the same as that of the original
lifetime: the remaining lifetime of the bulb at any instant follows the same distribution
as a new bulb. In other words, for a bulb that is on at time t, the probability that the
bulb will be on at t + s is simply the probability that the bulb will be on for s time
units. This can be exemplified as follows. When a person finds a bulb is lit in a
place and the person does not know from when the bulb has been lit, how long does
the person expect the bulb will be on? Surprisingly, if the lifetime of the bulb is
an exponential random variable, the remaining lifetime is the same as a new bulb.
In a slightly different way, this can be described as ‘the past does not influence the
future’, which is called the Markov property. ♦
3.5 Classes of Random Variables 225

Rewriting (3.5.31), we get

P(X > s + t) = P(X > s)P(X > t), (3.5.32)

which is satisfied by only exponential distribution (Komjath and Totik 2006) among
continuous12 distributions. This can be proved as follows: Assume a function g(·)
satisfies13

g(s + t) = g(s)g(t). (3.5.33)


  1      
Then, g n1 = g n (1) and g mn = g n1 + n1 + · · · + n1 = g m n1 because g(1) =
   
g n1 + n1 + · · · + n1 = g n n1 for any choice of natural numbers m and n. Thus,
m  m
g n = g n (1), and we can write as g(x) = g x (1) when g(·) is continuous from
 
right-hand side for x ≥ 0. Now, because g(1) = g 2 21 ≥ 0, we have g(x) = e−λx ,
where λ = − log{g(1)}. From this result and (3.5.30), we have the conclusion.

Example 3.5.16 The time required to finish a transaction in a bank is an exponential


random variable with a mean of 10 minutes. Assume that every customer will leave
the bank immediately after finishing the transaction. Find the probability P1 for a
customer to wait more than 15 minutes to finish the transaction after the arrival at
the bank. Also find the probability P2 that a customer will still be waiting at 10:15
when the customer arrived at the bank at 10:00 and has been waiting for 10 min.

Solution Let X be the waiting time for a customer at the bank to finish the transaction,
and denote by F the cdf of X . Then, the waiting time is the same as the time required
to finish the transaction, and X is an exponential random variable with rate λ = 10 1
.
Thus, we get P1 = P(X > 15) = 1 − P(X ≤ 15) = 1 − F(15) = e−15λ = e− 2 ≈
3

0.2231. Next, because an exponential random variable is not influenced by the past,
the probability P2 is the same as the probability that a customer will wait more than
5 minutes: in other words, P2 = P(X > 5) = e−5λ = e− 2 ≈ 0.6065.
1

Example 3.5.17 The lifetime of an electric bulb is an exponential random variable


with a mean of 10 hrs. A person returns to a room and finds that the bulb is lit. The
person needs five hours to complete a task in the room. Find the probability P5 that
the person can complete the task before the light gets off. Discuss what happens if
the lifetime is not an exponential random variable.

Solution Let X be the lifetime of the electric bulb. Then, because λ = 10


1
, we have
−5λ − 21
P5 = P(X > 5) = 1 − F(5) = e = e ≈ 0.6065. Next, assume that the life-
time is not an exponential random variable. Then, denoting by t the time that the
bulb was lit on, we have P5 = P(X > t + 5|X > t), i.e.,

12 Only the geometric distribution satisfies (3.5.32) among discrete distributions.


13 A similar relationship g(s + t) = g(s) + g(t) is called the Cauchy equation.
226 3 Random Variables

1 − F(t + 5)
P5 = , (3.5.34)
1 − F(t)

where F is the cdf of the lifetime of the bulb. Thus, the probability will be available
only if we know how long the bulb has been lit on at t. ♦

In discussing the memoryless property of the exponential random variable, we


often employ the failure rate function, also called the hazard rate function or condi-
tional rate of failure. The failure rate function is the probability that an object that
has been operated for t time units will become inoperable within the next dt time
units, i.e., the probability that the object will become inoperable after operating t
time units. As described in (3.4.28), the failure rate function β(t) can be expressed
as
f (t)
β(t) = (3.5.35)
1 − F(t)

for a random variable with pdf f and cdf F. The function β(t) is the conditional rate
of failure for an object to become inoperable after being operated for t time units.
Let us now discuss the failure rate function β(t) for an exponential random vari-
able. The rate of failure of an object that has operated for t time units is the same
as that of a new object because an exponential random variable is not influenced by
−λt
the past. In other words, as we can observe from β(t) = λe e−λt
= λ, the failure rate
function for an exponential random variable is the rate of the exponential random
variable, a constant independent of t. The rate is the inverse of the mean and repre-
sents how many events occurs on the average over a unit time interval: for example,
when the time interval between occurrences of an event is an exponential random
variable with mean 10 1
, the rate λ = 10 tells us that the event occurs ten times on the
average in a unit time.

Example 3.5.18 (Yates and Goodman 1999) Consider an exponential random vari-
able X with parameter λ. Show that K = X  is a geometric random variable with
parameter p = 1 − e−λ .

Solution The pmf of K is p K (k) = P(k − 1 < X ≤ k) = FX (k) − FX (k − 1) =


exp{−λ(k − 1)} − exp(−λk) = {exp(−λ)}k−1 {1 − exp(−λ)} = p(1 − p)k−1 , which
implies that K = X  is a geometric random variable with parameter p = 1 − e−λ .
Note that, although x + 1 = x in general for a real number x, the distribution of
X  + 1 here is the same as that of X  because X is a continuous random variable:
in other words, X  + 1 is equal to X  in distribution, which is often denoted by
d
X  + 1 = X . ♦
Appendices 227

Appendices

Appendix 3.1 Cumulative Distribution Functions and Their


Inverse Functions

(A) Cumulative Distribution Functions

We now discuss the cdf in the context of function theory (Gelbaum and Olmsted
1964).

Definition 3.A.1 (cdf) A real function F(x) possessing all of the three following
properties is a cdf:
(1) The function F(x) is non-decreasing: F(x + h) ≥ F(x) for h > 0.
(2) The function F(x) is continuous from the right-hand side: F x + = F(x).
(3) The function F(x) has the limits lim F(x) = 0 and lim F(x) = 1.
x→−∞ x→∞

A cdf is a finite and monotonic function. A point x such that F(x + ε) − F(x −
ε) > 0 for every positive number ε, F(x) = F x − , and F x + = F(x) = F x −
is called an increasing point, a continuous point, and a discontinuity, +respectively,
  −  of
F(x). Here, as
  we have already seen in Definition 1.3.3, p x = F x − F x =
F(x) − F x − is the jump of F(x) at x. A cdf may have only type 1 discontinuity,
i.e., jump discontinuity, with every jump between 0 and 1.

Example 3.A.1 Consider the function g shown in Fig. 3.20. Here, y1 is the local
minimum of y = g(x), and x1 and x11 are the two solutions to y1 = g(x). In addition,
y2 is the local maximum of y = g(x), and x2 and x22 are the solutions to y2 = g(x).
Let x3 < x4 < x5 be the X coordinates of the crossing points of the straight line
Y = y and the function Y = g(X ) for y1 < y < y2 . Then, x11 < x3 < x2 < x4 <
x1 < x5 < x22 and y = g (x3 ) = g (x4 ) = g (x5 ) for y1 < y < y2 . Obtain the cdf FY
of Y = g(X ) in terms of the cdf FX of X , and discuss if the cdf FY is a continuous
function.

Y
g(X)

y2
Y =y

y1

0 x11 x3 x2 x4 x1 x5 x22 X

Fig. 3.20 The function Y = g(X )


228 3 Random Variables

Solution When y > y2 or y < y1 , we get


 
FY (y) = FX g −1 (y) (3.A.1)
 
from Fig. 3.20 because P(Y ≤ y) = P(g(X ) ≤ y) = P X ≤ g −1 (y) . When y =
y1 , because {Y ≤ y1 } = {g(X ) ≤ y1 } = {X ≤ x11 } + {X = x1 }, we get

FY (y) = FX (x11 ) + P (X = x1 ) . (3.A.2)

In addition, when y1 < y < y2 , the cdf is

FY (y) = FX (x3 ) + FX (x5 ) − FX (x4 ) + P (X = x4 ) (3.A.3)

because {g(X ) ≤ y} = {X ≤ x3 } + {x4 ≤ X ≤ x5 } = {X ≤ x3 } + {x4 < X ≤


x5 } + {X = x4 }. Finally, from {g(X ) ≤ y2 } = {X ≤ x22 } we get

FY (y) = FX (x22 ) (3.A.4)

when y = y2 . Combining the results (3.A.1)–(3.A.4), we have the cdf of Y as


⎧  −1 
⎪ FX g (y) ,


y < y1 or y > y2 ,

⎨ FX (x11 ) + P (X = x1 ) , y = y1 ,
FY (y) = FX (x3 ) + FX (x5 ) − FX (x4 ) (3.A.5)



⎪ +P (X = x4 ) , y1 < y < y2 ,

FX (x22 ) , y = y2 .

Let us next discuss the continuity of the


 cdf (3.A.5).
 When y ↑ y1 , because

g −1 (y) → x11 , we get lim FY (y) = lim FX g −1 (y) , i.e.,
y↑y1 y↑y1

 −
lim FY (y) = FX x11 . (3.A.6)
y↑y1


When y ↓ y1 , we have lim FY (y) = lim FX (x3 ) + FX (x5 ) − FX (x4 ) + P X
  +  y↓y1  y↓y1  
= x4 = FX x11 + FX x1+ − FX x1− + P X = x1− , i.e.,

lim FY (y) = FX (x11 ) + P (X = x1 ) (3.A.7)


y↓y1

+ − +
 +  
 −  x3 → x11 , x4 → x1 , x5 → x14
because 1 , FX x11 = FX (x11 ), FX x1+ −
FX x1 = P (X = x1 ), and, for any type of random variable X , P X = x1−

 
14Note that P X = k − = 0 even for a discrete random variable X because the value p X (k) =
P(X = k) is 0 when k is not an integer for a pmf p X (k).
Appendices 229

= 0. Thus, from the second line of (3.A.5), (3.A.6), and (3.A.7), the continuity of the
cdf FY (y) at y = y1 can be summarized as follows: The cdf FY (y) is (A) continuous
from the right-hand side at y = y1 and (B) continuous
 − from the left-hand side at
y = y1 only if FX (x11 ) + P (X = x1 ) − FX x11 = P (X = x1 ) + P (X = x11 ) is
0 or, equivalently, only if P (X = x1 ) = P (X = x11 ) = 0.
− + −
Next,
 +  when  x3 →
 −y ↑ y2 , recollecting that +
 x2 , x4 → x2 , x5 → x22 ,
FX x2 − FX x2 = P (X = x2 ), and P X = x2 = 0, we get lim FY (y) =
   −y↑y
2
lim {FX (x3 ) + FX (x5 ) − FX (x4 ) + P (X = x4 )} = FX x2− + FX x22 −
y↑y2
 +  +

FX x2 + P X = x2 , i.e.,
 −
lim FY (y) = FX x22 − P (X = x2 ) . (3.A.8)
y↑y2

In addition,
 
lim FY (y) = lim FX g −1 (y)
y↓y2 y↓y2
= FX (x22 ) (3.A.9)
 + +
because FX x22 = FX (x22 ) and g −1 (y) → x22 for y ↓ y2 . Thus, from the fourth
case of (3.A.5), (3.A.8), and (3.A.9), the continuity of the cdf FY (y) at y = y2
can be summarized as follows: The cdf FY (y) is (A) continuous from the right-
hand side at y = y2 and (B) continuous from the left-hand side at y = y2 only

if FX (x22 ) − FX x22 + P (X = x2 ) = P (X = x2 ) + P (X = x22 ) is 0 or, equiv-
alently, only if P (X = x2 ) = P (X = x22 ) = 0. Exercises 3.56 and 3.57 also deal
with the continuity of the cdf. ♦
 +  −
Let {xν } be the set of discontinuities of the cdf F(x), and pxν = F xν − F xν
be the jump of F(x) at x = xν . Denote by Ψ (x) the sum of jumps of F(x) at
discontinuities not larger than x, i.e.,

Ψ (x) = pxν . (3.A.10)
xν ≤x

The function Ψ (x) is increasing only at {xν } and is constant in a closed interval not
containing xν : thus, it is a step-like function described following Definition 3.1.7.
If we now let
ψ(x) = F(x) − Ψ (x), (3.A.11)

then ψ(x) is a continuous function while Ψ (x) is continuous only from the right-
hand side. In addition, Ψ (x) and ψ(x) are both non-decreasing and satisfy Ψ (−∞)
= ψ(−∞) = 0, Ψ (+∞) = a1 ≤ 1, and ψ(+∞) = b ≤ 1. Here, the functions
Fd (x) = a11 Ψ (x) and Fc (x) = b1 ψ(x) are both cdf, where Fd (x) is a step-like func-
tion and Fc (x) is a continuous function. Rewriting (3.A.11), we get
230 3 Random Variables

F (x) Ψ (x) ψ(x)


1
0.8
0.7
0.5
0.4 0.4
0.3
0.1
0 1 2 3 x 0 1 2 3 x 0 1 2 3 x

Fig. 3.21 The decomposition of cdf F(x) = Ψ (x) + ψ(x) with a step-like function Ψ (x) and a
continuous function ψ(x)

F(x) = Ψ (x) + ψ(x), (3.A.12)

i.e.,

F(x) = a1 Fd (x) + bFc (x). (3.A.13)

Figure 3.21 shows an example of the cdf F(x) = Ψ (x) + ψ(x) with a step-like
function Ψ (x) and a continuous function ψ(x).
The decomposition (3.A.13) is unique because the decomposition shown in
(3.A.12) is unique. Let us prove this result. Assume that two decompositions
of F(x) = Ψ (x) + ψ(x) = Ψ1 (x) + ψ1 (x) are possible. Rewriting this equation,
we get Ψ (x) − Ψ1 (x) = ψ1 (x) − ψ(x). The left-hand side is a step-like function
because it is the difference of two step-like functions and, similarly, the right-hand
side is a continuous function. Therefore, both sides should be 0. In other words,
Ψ1 (x) = Ψ (x) and ψ1 (x) = ψ(x). By the discussion so far, we have shown the fol-
lowing theorem:
Theorem 3.A.1 Any cdf F(x) can be decomposed as

F(x) = a1 Fd (x) + bFc (x) (3.A.14)

into a step-like cdf Fd (x) and a continuous cdf Fc (x), where 0 ≤ a1 ≤ 1, 0 ≤ b ≤ 1,


and a1 + b = 1.

In Theorem 3.A.1, Fd (x) and Fc (x) are called the discontinuous or discrete part
and continuous part, respectively, of F(x).  Now, from Theorem 3.1.3, we see that
there exists a countable set D such that D d Fd (x) = 1, where the integral is the
Lebesgue-Stieltjes integral. The function Fc (x) is continuous but not differentiable
at all points. Yet, any cdf is differentiable at almost every point and thus, based on
the Lebesgue decomposition theorem, the continuous part Fc (x) in (3.A.14) can be
decomposed into two continuous functions Fac (x) and Fs (x) as

Fc (x) = b1 Fac (x) + b2 Fs (x), (3.A.15)


Appendices 231

where b1 ≥ 0, b2 ≥ 0, and b1 + b2 = 1. The functionx Fac (x) is the integral of the



derivative of Fac (x): in other words, Fac (x) =  −∞ ac F (y)dy, indicating that Fac (x)
is an absolutely continuous function and that N d Fac (x) = 0 for a set N of Lebesgue
measure 0. The function Fs (x) is a continuous function but the derivative is 0 at almost
every
 point. In addition, for a suitably chosen set N of Lebesgue measure 0, we have
N d Fs (x) = 1, indicating that Fs (x) is a singular function. Combining (3.A.14) and
(3.A.15), we have the following theorem (Lukacs 1970):

Theorem 3.A.2 Any cdf F(x) can be decomposed into a step-like function Fd (x),
an absolutely continuous function Fac (x), and a singular function Fs (x) as

F(x) = a1 Fd (x) + a2 Fac (x) + a3 Fs (x), (3.A.16)

where a1 ≥ 0, a2 ≥ 0, a3 ≥ 0, and a1 + a2 + a3 = 1.

In (3.A.16), when one of the three coefficients a1 , a2 , and a3 is 1, the cdf is


called pure: when a1 = 1, a2 = 1, or a3 = 1, the cdf is a discrete cdf, an absolutely
continuous cdf, or a singular cdf, respectively. Almost all cdf’s are practically discrete
cdf’s or absolutely continuous cdf’s. Because a singular cdf is only theoretically
meaningful with very rare practical applications, an absolutely continuous cdf is
considered to be a continuous cdf and no singular cdf is considered in statistics. In
this book, a continuous cdf indicates an absolutely continuous cdf unless specified
otherwise.
When the intervals between discontinuity points are all the same for a distribution,
the discrete distribution is called a lattice distribution, the discontinuities are called
lattice points, and the interval between adjacent two lattice points is called the span.

Example 3.A.2 Let us consider an example of a singular cdf. Assume the closed
interval [0, 1] and the ternary expression

 ai (x)
x=
i=1
3i
= 0.a1 (x)a2 (x)a3 (x) · · · (3.A.17)

of a point x in the Cantor set C discussed in Example 1.1.46, where ai (x) ∈ {0, 1}.
Now, let n(x) be the location of the first 1 in the ternary expression (3.A.17) of x with
n(x) = ∞ if no 1 appears eventually. Define a cdf as (Romano and Siegel 1986)

⎪ 0, x < 0,

⎪ g(x),
⎨ x ∈ ([0, 1] − C) ,
F(x) =  c jj , x=
 2c j
∈ C, (3.A.18)

⎪ j 2 3j

⎩ j
1, x ≥ 1,
232 3 Random Variables

i.e., as

⎨ 0, x < 0,
F(x) = φC (x), 0 ≤ x ≤ 1,

1, x ≥1


⎪ 0, x ≤ 0,
⎨ 
n(x)
ai (x)
= 21+n(x)
1
+ , 0 ≤ x ≤ 1, (3.A.19)


2i+1
⎩ i=1
1, x ≥1

based on the Cantor function φC (x) discussed in Example 1.3.11, where


⎧1
⎨ 2, < x < 23 ,
1
3
g(x) = 
k−1
cj (3.A.20)
⎩ 21k + 2j
, x ∈ A2c1 ,2c2 ,...,2ck−1
j=1

with the open interval A2c1 ,2c2 ,...,2ck−1 defined by (1.1.41) and (1.1.42). Then, it is easy
to see that F(x) is a continuous cdf. In addition, as we have observed in Example
1.3.11, the derivative of F(x) is 0 at almost every point in ([0, 1] − C) and thus is
not a pdf. In other words, F(x) is not an absolutely continuous cdf but is a singular
cdf.  
Some specific values of the cdf (3.A.19) are as follows: We have F 19 =

2
ai ( 19 )    
1
21+2
+ = 1 + 1 = 1 from 1 = 0.01 and n 1 = 2, we have F 2 =
2i+1 8 8 4 9 3 9 9
i=1
∞
ai ( 29 ) 2 1
1
2∞
+ 2i+1
= 1
4
from 2
9
= 0.023 and n 9
= ∞, and we have F 3
= 1
21+1
+
i=1

1
ai ( 13 )    
2i+1
=from 13 = 0.13 and n 13 = 1. Similarly, from 23 = 0.23 and n 23 =
1
2
i=1  
∞, we have F 23 = 0 + 24 = 21 ; from 0 = 0.03 and n(0) = ∞, we have F(0) = 0;
∞
and from 1 = 0.22 · · ·3 and n(1) = ∞, we have F(1) = 0 + 2
2i+1
= 1. ♦
i=1

(B) Inverse Cumulative Distribution Functions

The inverse cdf F −1 , the inverse function of the cdf F, can be defined specifically
as

F −1 (u) = inf{x : F(x) ≥ u} (3.A.21)

for 0 < u < 1. Because F is a non-decreasing function, we have {x : F(x) ≥ y2 } ⊆


{x : F(x) ≥ y1 } when y1 < y2 and, consequently, inf{x : F(x) ≥ y1 } ≤ inf{x :
F(x) ≥ y2 }. Thus,
Appendices 233

F1 (x) F1−1 (u)

1 3

1
2
2
1 1
3

0 1 2 3 x 0 1 1 1 u
3 2

Fig. 3.22 The cdf F1 and the inverse cdf F1−1

F −1 (y1 ) ≤ F −1 (y2 ) (3.A.22)

for y1 < y2 . In other words, like a cdf, an inverse cdf is a non-decreasing function.

Example 3.A.3 Consider the cdf



⎨ 0, x ≤ 0; x
3
, 0 ≤ x < 1;
F1 (x) = 1
, 1 ≤ x ≤ 2; 1
(x − 1), 2 ≤ x ≤ 3; (3.A.23)
⎩2 2
1, x ≥ 3.

Then, the inverse cdf is



3u, 0 < u ≤ 13 ; 1, 1
≤ u ≤ 21 ;
F1−1 (u) = 3 (3.A.24)
2u + 1, 21 < u < 1.

  
For example, we have F1−1 13 = inf x : F1 (x) ≥ 13 = inf {x : x ≥ 1} = 1 and
  
F1−1 21 = inf x : F1 (x) ≥ 21 = inf {x : x ≥ 1} = 1. Note that, unlike a cdf, an
inverse cdf is continuous from the left-hand side. Figure 3.22 shows the cdf F1 and
the inverse cdf F1−1 . ♦

Theorem 3.A.3 (Hajek et al. 1999) Let F and F −1 be a cdf and its inverse, respec-
tively. Then,
 
F F −1 (u) ≥ u (3.A.25)

for 0 < u < 1: in addition,


 
F F −1 (u) = u (3.A.26)

if F is continuous.
234 3 Random Variables

Proof Let Su be the set15 of all x such that F(x) ≥ u, and x L be the smallest number
in Su . Then,

F −1 (u) = x L (3.A.27)

and

F (x L ) ≥ u (3.A.28)

because F(x) ≥ u for any point x ∈ Su . In addition, because Su is the set of all x
such that F(x) ≥ u, we have F(x) < u for any point x ∈ Suc . Now, x L is the smallest
number in Su and thus x L −  ∈ Suc because x L −  < x L when  > 0. Consequently,
we have

F (x L − ) < u (3.A.29)
 
when  > 0. Using (3.A.27) in (3.A.28), we get F F −1 (u) ≥  u. Next, recollect-

−1
ing (3.A.27),
 and combining (3.A.28) and (3.A.29), we
 get F F  (u) − < u ≤ 
F F −1 (u) . In other words,
 u is a number
 between F F −1(u) −   and F F −1 (u) .
Now, u = F F −1 (u) because lim F F −1 (u) −  = F F −1 (u) if F is a contin-
→0
uous function. ♠
Example 3.A.4 For the cdf

⎨ 0, x < 0; x
4
, 0 ≤ x < 1;
F2 (x) = 1
, 1 ≤ x < 2; 1
(x + 1), 2 ≤ x < 3; (3.A.30)
⎩2 4
1, x ≥ 3

and the inverse cdf



4u, 0 < u ≤ 41 ; 1, 1
≤ u ≤ 21 ;
F2−1 (u) = 4 (3.A.31)
2, 21 < u ≤ 43 ; 4u − 1, 3
4
≤ u < 1,

     
we have F2 F2−1 41 = F2 (1) = 21 ≥ 41 , F2 F2−1 21 = F2 (1) = 21 ≥ 21 , F2−1
      
(F2 (1)) = F2−1 21 = 1 ≤ 1, and F2−1 F2 23 = F2−1 21 = 1 ≤ 23 . Figure 3.23
shows the cdf F2 and the inverse cdf F2−1 . ♦
Note that even if F is a continuous function, we have not F −1 (F(x)) = x but

F −1 (F(x)) ≤ x. (3.A.32)

It is also noteworthy that

15 Here, because a cdf is continuous from the right-hand side, Su is either in the form Su = [x L , ∞)
or in the form Su = {x L , x L + 1, . . .}.
Appendices 235

F2 (x) F2−1 (u)

1 3
3
4
2
1
2
1 1
4

0 1 2 3 x 0 1 1 3 1 u
4 2 4

Fig. 3.23 The cdf F2 and the inverse cdf F2−1

 
P F −1 (F(X )) = X = 0 (3.A.33)

when X ∼ F.
Example 3.A.5 Consider the cdf F1 (x) and the inverse cdf F1−1 (u) discussed in
Example 3.A.3. Then, we have


⎪ 0, F1−1 (u) ≤ 0,
⎪ 1 −1

 −1  ⎨ 31 F1 (u), 0 ≤ F1−1 (u) < 1,
−1
F1 F1 (u) = 2 ,  1 ≤ F1−1 (u) ≤ 2,

⎪ −1


1
F (u) − 1 , 2 ≤ F1 (u) ≤ 3,
⎩2 1
1, F1−1 (u) ≥ 3

u, 0 < u < 13 ; 21 , 13 ≤ u ≤ 21 ;
= (3.A.34)
u, 21 < u < 1

and

3F1 (x), 0 < F1 (x) ≤ 13 ; 1, 1
≤ F1 (x) ≤ 21 ;
F1−1 (F1 (x)) = 3
2F1 (x) + 1, 21 < F1 (x) < 1

x, 0 < x < 1; 1, 1 ≤ x ≤ 2;
= (3.A.35)
x, 2 < x < 3,

which
 are shown
 in Fig. 3.24. The results (3.A.34) and (3.A.35) clearly confirm
F1 F1−1 (u) ≥ u shown in (3.A.26) and F1−1 (F1 (x)) ≤ x shown in (3.A.32). ♦
The results of Example 3.A.5 and Exercises 3.79 and 3.80 imply the following:
In general, even when a cdf is a continuous function, if it is constant
 over an inter-
val, its inverse is discontinuous. Specifically, if F(a) = F b− = α for a < b or,
equivalently, if F(x) = α over a ≤ x < b, the inverse cdf F −1 (u) is discontinuous
at u = α and F −1 (α) = a. On the other hand, even if a cdf is not a continuous func-
tion, its inverse is a continuous function when the cdf is not constant over an interval.
Figure 3.25 shows an example of a discontinuous cdf with a continuous inverse.
236 3 Random Variables

F1 (x) F1 F1−1 (u) F1−1 (F1 (x))

1 1 3

1 1
2
2 2
1 1 1
3 3

0 1 2 3 x 0 1 1 1 u 0 1 2 3 x
3 2

Fig. 3.24 The results F1 F1−1 (u) and F1−1 (F1 (x)) from the cdf F1

F (x) F −1 (u)

1 2
2
3
1
1
3

0 1 2 x 0 1 2 1 u
3 3

Fig. 3.25 A discontinuous cdf F with a continuous inverse F −1

Theorem 3.A.4 (Hajek et al. 1999) If a cdf F is continuous, its inverse F −1 is strictly
increasing.

−1
Proof When F is continuous, −1assume ) = F −1 (y2 ) for y1 < y2 . Then, from
 F (y1−1
(3.A.26), we have y1 = F F (y1 ) = F F (y2 ) = y2 , which is a contradiction
to y1 < y2 . In other words,

F −1 (y1 ) = F −1 (y2 ) (3.A.36)

when y1 < y2 . From (3.A.22) and (3.A.36), we have F −1 (y1 ) < F −1 (y2 ) for y1 <
y2 when the cdf F is continuous. ♠

Theorem 3.A.5 (Hajek et al. 1999) When the pdf f is continuous, we have

d −1 1
F (u) =   (3.A.37)
du −1
f F (u)
 
if f F −1 (u) = 0, where F is the cdf.
Appendices 237

Proof When f is continuous, F is also continuous. Letting F −1 (u) = v, we get


d
du
F −1 (u) = f (v)
1 dv
dv
, i.e.,

d −1 1
F (u) =  −1  (3.A.38)
du f F (u)
 
because F(v) = F F −1 (u) = u and du = f (v)dv from (3.A.26). ♠

Appendix 3.2 Proofs of Theorems

(A) Proof of Theorem 3.5.1

Letting k − np = l, rewrite Pn (k) = n Ck p k (1 − p)n−k as

n!
n Ck p
k
(1 − p)n−k = p np+l q nq−l
(np + l)!(nq − l)!
n! p np q nq (np)!(nq)! pl
= × . (3.A.39)
(np)!(nq)! (np + l)!(nq − l)! q l
√  n
First, using the Stirling approximation n! ≈ 2πn ne , which can be obtained
√  n n √    
n n
from 2πn e < n! < 2πn 1 + 4n 1
e
, the first part of the right-hand side
of (3.A.39) becomes

n! p np q nq 2πnn n e−n
≈ √ √ p np q nq
(np)!(nq)! 2πnp(np)np e−np 2πnq(nq)nq e−nq

2πn n n p np q nq e−n
= √ √ × × −np −nq
2πnp 2πnq (np) (nq)
np nq e e
1
= √ . (3.A.40)
2πnpq

The second part of the right-hand side of (3.A.39) can be rewritten as

(np)!(nq)! pl (np)!(npq)l (nq)! pl


= × . (3.A.41)
(np + l)!(nq − l)! q l (np + l)!q l (nq − l)!(npq)l

Letting t j = j
npq
and using e x ≈ 1 + x for x ≈ 0, the first part of the right-hand
,
 (
+
l  −1
(np)!(npq)l (np)l
side of (3.A.41) becomes (np+l)!q l = (np+l)(np+l−1)···(np+1) = 1 + npj
=
j=1
238 3 Random Variables
 (−1  −1
l 
+  +
l
1 + qt j ≈ e qt j
and the second part of the right-hand side
j=1 j=1
+
l
(nq+1− j)
+
l 
(nq)! pl
of (3.A.41) can be rewritten as (nq−l)!(npq)l
= j=1
(nq)l
= nq+1
nq
− j
nq

j=1
+
l  l 
+  +
l
1− j
nq
= 1 − pt j ≈ e− pt j . Employing these two results in (3.A.41),
j=1 j=1 j=1
+
l +
l  
(np)!(nq)! pl e− pt j
we get (np+l)!(nq−l)! q l
≈ eqt j
= e−t j = exp − l(l+1)
2npq
, i.e.,
j=1 j=1

 
(np)!(nq)! pl l2
≈ exp − . (3.A.42)
(np + l)!(nq − l)! q l 2npq

Now, recollecting l = k − np, we get the desired result (3.5.16) from (3.A.39),
(3.A.40), and (3.A.42).

(B) Proof of Theorem 3.5.2

For k = 0, we have

lim Pn (0) = e−λ (3.A.43)


n→∞

 n
because lim (1 − p)n = lim 1 − λn = e−λ when np → λ. Next, for k =
n→∞ n→∞
1, 2, . . . , n, we have

n(n − 1) · · · (n − k + 1) k
Pn (k) = p (1 − p)n−k
k!     
(np)k 1 − n1 1 − n2 · · · 1 − k−1
= (1 − p) n n
. (3.A.44)
k! (1 − p)k

Now, 1 − p ≈ e− p when p is small, and


    
1 − n1 1 − n2 · · · 1 − k−1
1− 1
1− 2
1− k−1
1
n
→ n
λ
n
λ
... n
(1 − p)k 1− n
1− n
1 − λn 1− λ
n
→1 (3.A.45)

when k = 1, 2, . . . is fixed, p → λn , and n → ∞. Thus, letting np → λ, we get


 n k
Pn (k) → e− p λk! = e−λ λk! : from this result and (3.A.43), we have (3.5.19).
k
Appendices 239

Appendix 3.3 Distributions and Moment Generating Functions

(A) Discrete distributions: pmf p(k) and mgf M(t)

Bernoulli distribution: α ∈ (0, 1)



1 − α, k = 0,
p(k) = (3.A.46)
α, k=1
M(t) = 1 − α + αet (3.A.47)

Binomial distribution: n = 1, 2, . . . , α ∈ (0, 1)

p(k) = n Ck αk (1 − α)n−k , k = 0, 1, . . . (3.A.48)


 n
M(t) = 1 − α + αet (3.A.49)

Geometric distribution 1: α ∈ (0, 1)

p(k) = α(1 − α)k , k = 0, 1, . . . (3.A.50)


α
M(t) = (3.A.51)
1 − (1 − α)et

Geometric distribution 2: α ∈ (0, 1)

p(k) = α(1 − α)k−1 , k = 1, 2, . . . (3.A.52)


αet
M(t) = (3.A.53)
1 − (1 − α)et

Negative binomial distribution: r > 0, α ∈ (0, 1)

p(k) = −r Ck α (α
r
− 1)k , k = 0, 1, . . . (3.A.54)
 r
α
M(t) = (3.A.55)
1 − (1 − α)et

Pascal distribution: r > 0, α ∈ (0, 1)

p(k) = k−1 Cr −1 α (1
r
− α)k−r , k = r, r + 1, . . . (3.A.56)
 r
αet
M(t) = (3.A.57)
1 − (1 − α)et
240 3 Random Variables

Poisson distribution: λ > 0

λk −λ
p(k) = e , k = 0, 1, . . . (3.A.58)
k! 
M(t) = exp −λ(1 − et ) (3.A.59)

Uniform distribution
1
p(k) = , k = 0, 1, . . . , n − 1 (3.A.60)
n
1 − ent
M(t) = (3.A.61)
n(1 − et )

(B) Continuous distributions: pdf f (x) and mgf M(t)

Cauchy distribution: α > 0 (cf ϕ(ω) instead of mgf)

α 1
f (x) = (3.A.62)
π (x − β)2 + α2
ϕ(ω) = exp ( jβω − α|ω|) (3.A.63)

Central chi-square distribution: n = 1, 2, . . .

x 2 −1 x
n

f (x) =  n  n exp − u(x) (3.A.64)


Γ 2 22 2
M(t) = (1 − 2t)− 2
n
(3.A.65)

Exponential distribution: λ > 0

f (x) = λe−λx u(x) (3.A.66)


λ
M(t) = (3.A.67)
λ−t
Exercises 241

Gamma distribution: α > 0, β > 0


 
x α−1 x
f (x) = exp − u(x) (3.A.68)
Γ (α)β α β
1
M(t) = (3.A.69)
(1 − βt)α

Laplace (double exponential) distribution: λ > 0

λ −λ|x|
f (x) = e (3.A.70)
2
λ2
M(t) = 2 (3.A.71)
λ − t2

Normal distribution
 
1 (x − m)2
f (x) = √ exp − (3.A.72)
2πσ 2 2σ 2
 2 2
σ t
M(t) = exp mt + (3.A.73)
2

Rayleigh distribution16 : α > 0


 
x x2
f (x) = exp − u(x) (3.A.74)
α2 2α2
 2 2
√ α t
M(t) = 1 + 2π αt exp Φ (αt) (3.A.75)
2

Uniform distribution: b > a


1
f (x) = u(x − a)u(b − x) (3.A.76)
b−a
ebt − eat
M(t) = (3.A.77)
(b − a)t

16 In (3.A.75), the function Φ(·) is the standard normal cdf defined in (3.5.3).
242 3 Random Variables

Exercises

Exercise 3.1 Show that

lim x F(−x) = 0 (3.E.1)


x→∞

for an absolutely continuous cdf F. Using this result, show that


 ∞  0
E{X } = {1 − FX (x)} d x − FX (x)d x (3.E.2)
0 −∞

for a random variable X with a continuous and absolutely integrable pdf.

Exercise 3.2 Express the cdf of



⎨ X − c, X > c,
g(X ) = 0, −c ≤ X ≤ c, (3.E.3)

X + c, X < −c

in terms of the cdf FX of a continuous random variable X , where c > 0.

Exercise 3.3 Express the cdf of



X + c, X ≥ 0,
g(X ) = (3.E.4)
X − c, X < 0

in terms of the cdf FX of X , where c > 0.

Exercise 3.4 Express the pdf f Y of Y = a sin(X + θ) in terms of the pdf of X ,


where a > 0 and θ are constants.

Exercise 3.5 Obtain the pmf of X in Example 3.1.17 assuming that each ball taken
is replaced into the box before the following trial.

Exercise 3.6 Obtain the pdf and cdf of Y = X 2 + 1 when X ∼ U [−1, 2).

Exercise 3.7 Obtain the cdf of Y = X 3 − 3X when the pmf of X is p X (k) = 17 for
k ∈ {0, ±1, ±2, ±3}.

Exercise 3.8 Obtain the expected value E X −1 when the pdf of X is f X (r ) =
 
r 2 −1 exp − r2 u(r ).
1 n
n
2 2 Γ ( n2 )

Exercise 3.9 Express the pdf of the output Y = X u(X ) of a half-wave rectifier in
terms of the pdf f X of X .
Exercises 243

 2
Exercise 3.10 Obtain the pdf of Y = X − 1θ when the pdf of X is f X (x) =
θe−θx u(x).

Exercise 3.11 Let the pdf and cdf of a continuous random variable X be f X and
FX , respectively. Obtain the conditional cdf FX |b<X ≤a (x) and the conditional pdf
f X |b<X ≤a (x) in terms of f X and FX , where a > b.

Exercise 3.12 For X ∼ U [0, 1), obtain the conditional mean E{X |X > a} and
2
conditional variance Var{X |X > a} = E X − E{X |X > a}  X > a when 0 <
a < 1. Obtain the limits of the conditional mean and conditional variance when
a → 1.

Exercise 3.13 Obtain the probability P(950 ≤ R < 1050) when the resistance R of
a resistor has the uniform distribution U (900, 1100).

Exercise 3.14 The cost of being early and late by s minutes for an appointment is
cs and ks, respectively. Denoting by f X the pdf of the time X taken to arrive at the
location of the appointment, find the time of departure for the minimum cost.

Exercise 3.15 Let ω be the outcome of a random experiment of taking one ball from
a box containing one each of red, green, and blue balls. Obtain P(X ≤ α), P(X ≤ 0),
and P(2 ≤ X < 4), where

π, ω = green ball or blue ball,
X (ω) = (3.E.5)
0, ω = red ball

with α a real number.


   
Exercise 3.16 For V ∼ U [−1, 1], obtain P(V > 0), P |V | < 13 , P |V | ≥ 43 , and
 
P 13 < V < 21 .

Exercise 3.17 In successive tosses of a fair coin, let the number of tosses until the
first head be K . For the two events A = {K > 5} and B = {K > 10}, obtain the
probabilities of A, B, B c , A ∩ B, and A ∪ B.

Exercise 3.18 Data is transmitted via a sequence of N bits through two independent
channels C A and C B . Due to channel noise during the transmission, w A and w B bits
are in error among the sequences A and B of N bits received through channels C A
and C B , respectively. Assume that the noise on a bit does not influence that on others.
(1) Obtain the probability P(D = d) that the number D of error bits common to A
and B is d.
(2) Assume the sequence of N bits is reconstructed by selecting each bit from A
with probability p or from B with probability 1 − p. Obtain the probability
P(K = k) that the number K of error bits is k in the reconstructed sequence of
N bits.
(3) When N = 3 and w A = w B = 1, obtain P(D = d) and P(K = k).
244 3 Random Variables

Exercise 3.19 Assume the pdf f (x) = c


x3
u(x − 1) of X .
(1) Determine the constant c.
(2) Obtain the mean E{X } of X .
(3) Show that the variance of X does not exist.

Exercise 3.20 Assume the cdf



0, x < 0; 1
(x 2 + 1), 0 ≤ x < 1;
F(x) = 1 4 (3.E.6)
4
(x + 2), 1 ≤ x < 2; 1, x ≥2

of X .
(1) Obtain P(0 < X < 1) and P(1 ≤ X < 1.5).
(2) Obtain the mean μ = E{X } and variance σ 2 = Var{X } of X .

Exercise 3.21 For Y ∼ U [0, 1) and a < b, consider W = a + (b − a)Y .


(1) Obtain the cdf of W .
(2) Obtain the distribution of W .

Exercise 3.22 Obtain the pdf of Y = X


1+X
when X ∼ U [0, 1).

Exercise 3.23 Express the pdf f Y of Y = X


1−X
in terms of the pdf f X of X , and
obtain f Y when X ∼ U [0, 1).

Exercise 3.24 Consider Y = 1+X X


and Z = 1−Y Y
. Then, for the pdf’s f Z and f X of
Z and X , respectively, it should hold true that f Z (z) = f X (z) because Z = 1−Y
Y
=
X
1+X
X
1− 1+X
= X . Confirm this fact from the results of Exercise 3.22 and 3.23.

Exercise 3.25 Obtain the pdf of Z = X


when the pdf of X is f X (y) =
  1−X
1
(1−y)2
u(y) − u y − 21 .

Exercise 3.26 When X ∼ U [0, 1), obtain the pdf of Y = 1+X


1−X
and the pdf of Z =
X −1
X +1
.

Exercise 3.27 Assuming the pdf



⎨ 1 + x, −1 ≤ x < 0,
f X (x) = 1 − x, 0 ≤ x < 1, (3.E.7)

0, x < −1 or x ≥ 1

of X , obtain the pdf of Y = |X |.

Exercise 3.28 Consider Y = a cos(X + θ) for a random variable X , where a > 0


and θ are constants.
(1) Obtain the pdf f Y of Y when X ∼ U [−π, π).
Exercises 245
 
(2) Obtain the pdf f Y of Y when X ∼ U − π2 , π2 , a = 1, and θ = 0.
 3 
(3) Obtain the cdf FY of Y when X ∼ U 0, 2 π .

Exercise 3.29 Consider Y = tan X for a random variable X .


(1) Express the pdf f Y of Y in terms of the
 pdf fX of X .
(2) Obtain the pdf f Y of Y when X ∼ U − π2 , π2 .
(3) Obtain the pdf f Y of Y when X ∼ U (0, 2π).

Exercise 3.30 Find a function g for Y = g(X ) with which we can obtain the expo-
nential random variable Y ∼ f Y (y) = λe−λy u(y) from the uniform random variable
X ∼ U [0, 1).

Exercise 3.31 For a random variable X with pmf p X (v) = 1


6
for v = 1, 2, . . . , 6,
obtain the expected value, mode, and median of X .

Exercise 3.32 Assume p X (k) = 17 for k ∈ {1, 2, 3, 5, 15, 25, 50}. Show that the
 − c|} is 5. Compare this value with the value of
value of c that minimizes E{|X
b that minimizes E (X − b)2 .

Exercise 3.33 Obtain the mean and variance of a random variable X with pdf f (x) =
λ −λ|x|
2
e , where λ > 0.
r α−1 (1−r )β−1
Exercise 3.34 For a random variable X with pdf f (r ) = B̃(α,β)
u(r )u(1 − r ),
show that the k-th moment is
 Γ (α + k)Γ (α + β)
E Xk = (3.E.8)
Γ (α + β + k)Γ (α)

and obtain the mean and variance of X .

Exercise 3.35 Show that E{X } = R  (0) and Var(X ) = R  (0), where R(t) =
ln M(t) and M(t) is the mgf of X .

Exercise 3.36 Obtain the pdf’s f Y , f Z , and f W of Y = X − 2, Z = 2Y , and W =


Z + 1, respectively, when the pdf of X is f X (x) = u(x) − u(x − 1).

Exercise 3.37 Obtain the pmf’s pY , p Z , and pW of Y = X + 2, Z = X − 2, and


W = XX −2
+2
, respectively, when the pmf of X is p X (k) = 41 for k = 1, 2, 3, 4.

Exercise 3.38 For a random variable X such that P(0 ≤ X ≤ a) = 1, show the
following:

(1) E X 2 ≤ aE{X }. Specify when the equality holds true.
(2) Var{X } ≤ E{X }(a − E{X }).
2
(3) Var{X } ≤ a4 . Specify when the equality holds true.

Exercise 3.39 The value of a random variable X is not less than 0.


246 3 Random Variables

(1) When X can assume values in {0, 1, 2, . . .}, show that the expected value E{X } =


nP(X = n) is
n=0



E{X } = P(X > n)
n=0
∞
= P(X ≥ n). (3.E.9)
n=1


(2) Express E X c− in terms of FX (x) = P(X ≤ x), where X c− = min(X, c) for a
constant c. 
(3) Express E X c+ in terms of FX (x), where X c+ = max(X, c) for a constant c.
   
Exercise 3.40 Assume the cf ϕ X (ω) = exp μ exp λ e jω − 1 − 1 of X ,
where λ > 0 and μ > 0.
(1) Obtain E{X } and Var{X }.  
(2) Show that P(X = 0) = exp −μ 1 − e−λ .

Exercise 3.41 Let N be the number of hydrogen molecules in a sphere of radius r


with volume V = 4π 3
r 3 . Assuming that N has the Poisson pmf p N (n) = n!1 e−ρV (ρV )n
for n = 0, 1, 2, . . ., obtain the pdf of the distance X from the center of the sphere to
the closest hydrogen molecule. (Hint. Try to express P(X > x) via p N .)

Exercise 3.42 Determine the constant A when



⎨ Ax, 0 ≤ x ≤ 4,
f (x) = A(8 − x), 4 ≤ x ≤ 8, (3.E.10)

0, otherwise

is the pdf of X . Sketch the cdf and pdf, and obtain P(X ≤ 6).

Exercise
α  ∞median α of a1continuous random variable X can be defined via
3.43 The
−∞ X f (x)d x = α f X (x)d x = 2 . Show that the value of b minimizing E{|X − b|}
is α.

Exercise 3.44 When F is the cdf of continuous random variable X , obtain the
expected value E{F(X )} of F(X ).

Exercise 3.45 Obtain the mgf and cf of a negative exponential random variable X
with pdf f X (x) = e x u(−x). Using the mgf, obtain the first four moments. Compare
the results with those obtained directly.

Exercise 3.46 Show that f X (x) = 1


cosh(πx)
can be a pdf. Obtain the corresponding
mgf.
Exercises 247

Exercise 3.47 When f (x) = coshαn (βx) is a pdf, determine the value of α in terms of
β. Note that this pdf is the same as the logistic pdf (2.5.30) when n = 2 and β = 2k .

Exercise 3.48 Show that the mgf is as shown in (3.A.75) for the Rayleigh random
x2
variable with pdf f (x) = αx2 exp − 2α 2 u(x).

Exercise 3.49 For X ∼ b(n, p) with q = 1 − p, show

1 
P(X = an even number) = 1 + (q − p)n (3.E.11)
2

and the recurrence formula


(n − k) p
P(X = k + 1) = P(X = k) (3.E.12)
(k + 1)q

for k = 0, 1, . . . , n − 1.

Exercise 3.50 When X ∼ P(λ), show

1 
P(X = an even number) = 1 + e−2λ (3.E.13)
2

and the recurrence formula


λ
P(X = k + 1) = P(X = k) (3.E.14)
k+1

for k = 0, 1, 2, . . ..

Exercise 3.51 When the pdf of X is


⎧1
⎨ 2, −1 < x ≤ 0,
f X (x) = 1
(2 − x), 0 < x ≤ 2, (3.E.15)
⎩4
0, otherwise,

obtain the pdf f Y of Y = |X |.

Exercise 3.52 Find a cdf that has infinitely many jumps in the finite interval (a, b).

Exercise 3.53 Assume that



0, x < −2; 1
(x + 2), −2 ≤ x < −1;
FX (x) = 3 (3.E.16)
ax + b, −1 ≤ x < 3; 1, x ≥3

is the cdf of X .
248 3 Random Variables

(1) Obtain the condition that the two constants a and b should satisfy and sketch the
region of the condition on a plane with the a-b coordinates.
(2) When a = 18 and b = 58 , obtain the cdf of Y = X 2 , P(Y = 1), and P(Y = 4).

Exercise 3.54 Obtain the cdf of Y = X 2 for the cdf



0, x < −1; 1
(x + 1), −1 ≤ x < 0;
FX (x) = 1 2 (3.E.17)
8
(x + 4), 0 ≤ x < 4; 1, x ≥4

of X .

Exercise 3.55 Show that




⎪ FX (α), y < −2 or y > 2,


⎨ X
F (−2) + P(X = 1), y = −2,
FY (y) = FX (β3 ) − FX (β2 ) (3.E.18)


⎪ +FX (β1 ) + P (X = β2 ) , −2 < y < 2,


FX (2), y=2

is the cdf of Y = X 3 − 3X expressed in terms of the cdf FX of X . Here, α is the only


real root of the equation y = x 3 − 3x when y > 2 or y < −2, and β1 < β2 < β3 are
the three real roots of the equation when −2 < y < 2 with α, β1 , β2 , and β3 being
all functions of y.

Exercise 3.56 Discuss the continuity of the cdf FY (y) shown in (3.E.18) when X is
a continuous random variable.

Exercise 3.57 For a random variable X with pmf

p X (k) = P(X = k), k ∈ {· · · , −1, 0, 2, . . .}, (3.E.19)

obtain the cdf FY of Y = X 3 − 3X and discuss its continuity.

Exercise 3.58 Obtain the cdf of Y = X 2 for the cdf




⎪ 0,
⎨ 1 2  x < −1,
−2 x −1 , −1 ≤ x < 0,
FX (x) = (3.E.20)


1
x 2 + 1 , 0 ≤ x < 1,
⎩2
1, x ≥1

of X .
 2π  2 
Exercise 3.59 Assume the cf ϕ(ω) = 1
2π 0 exp − ω2 α(θ) dθ of a random vari-
able.
(1) Obtain the cf’s by completing the integral when α(θ) = 1
2
and when α(θ) =
cos2 θ.
Exercises 249

(2) When α(θ) = 21 , obtain the pdf f 1 .


(3) Show that
 2  2
1 x x
f 2 (x) = √ exp − K0 (3.E.21)
2π 3 4 4

is the pdf when α(θ) = cos2 θ. Here,


 ∞
cos(xt)
K 0 (x) = √ dt u(x) (3.E.22)
0 1 + t2

is the zeroth order modified Bessel function of the second type.


(4) Obtain the mean and variance for f 1 and those for f 2 .
(x)
(5) Show that the pdf f 2 is heavier-tailed than f 1 : that is, ff21 (x) > 1 when x is suffi-
ciently large.

Exercise 3.60 From (3.3.31), we have



 0, n is odd,
E X n
= (3.E.23)
1 × 3 × · · · × (n − 1), n is even

when X ∼ N (0, 1). Show that


p
2
  p! σ 2n
E Y p
= mp (3.E.24)
n=0
2n n!( p − 2n)! m

 
for p = 0, 1, . . . when
 Y ∼ N m,  σ 2 . The result (3.E.24)
 implies that we have

E Y 0 = 1, E Y 1 = m, E Y 2 = m 2 + σ 2 , E Y 3 = m 3 + 3mσ 2 , E Y 4 = 
m 4 + 6m 2 σ 2 + 3σ 4 , E Y 5 = m 5 + 10m 3 σ 2 + 15mσ 4 , . . . when Y ∼ N m, σ 2 .
(Hint. Note that Y = σ X + m.)

Exercise 3.61 Show that


 π
 1 × 3 × · · · × nαn , n = 2k + 1,
E Xn = 2 (3.E.25)
2k k!α2k , n = 2k

and obtain themean and variance for a Rayleigh random variable X with pdf f X (x) =
x2
x
α2
exp − 2α 2 u(x).

Exercise 3.62 Show that



 1 × 3 × · ·· × (n + 1)αn , n = 2k,
E X n
= (3.E.26)
2k k!α2k−1 2
π
, n = 2k − 1
250 3 Random Variables
√ 2 
x2
for a random variable X with pdf f X (x) = 2x

α3 π
exp − 2α 2 u(x).

Exercise 3.63 For a random variable with pdf f X (x) = 21 u(x)u(π − x) sin x, obtain
the mean and second moment.

Exercise 3.64 Show that the mean is E{X } = 1−α


α
r and variance is Var{X } = 1−α
α2
r
for the NB random variable X with the pmf p(x) = −r Cx αr (α − 1)x , x ∈ J0 shown
in (2.5.13). When the pmf of Y is pY (y) = y−1 Cr −1 αr (1 − α) y−r for y = r, r +
1, . . . as shown in (2.5.17), show that
r
E{Y } = (3.E.27)
α

and Var{Y } = 1−α


α2
.

Exercise 3.65 Consider the absolute value Y = |X | for a continuous random vari-
able X with pdf f X . If we consider the half mean17
 ∞
m ±X = x f X (x)u(±x)d x, (3.E.28)
−∞

then the mean of X can be expressed as m X = m +X + m −X . Show that the mean of


Y = |X | can be expressed as

E{|X |} = m +X − m −X , (3.E.29)

and obtain the variance of Y = |X | in terms of the variance and half means of X .

Exercise 3.66 For a continuous random variable X with cdf FX , show that

E{Z } = 1 − 2FX (0) (3.E.30)

and

Var{Z } = 4FX (0) {1 − FX (0)} (3.E.31)

are the mean and variance, respectively, of Z = sgn(X ).

Exercise 3.67 For a random variable X with pmf p(k) = (1 − α)k α for k ∈
{0, 1, . . .}, obtain the mean and variance. For a random variable X with pmf
p(k) = (1 − α)k−1 α for k ∈ {1, 2, . . .}, obtain the mean and variance.

∞
More generally, 0 x m f X (x)d x for m = 1, 2, . . . are called the half moments, incomplete
17

moments, or partial moments.


Exercises 251

Exercise 3.68 Consider a hypergeometric random variable X with pmf


  
1 α β
p X (x) = α+β  , (3.E.32)
γ
γ − x x

where α, β, and γ are natural numbers such that α + β ≥ γ. Note that min(β, γ) ≥
max(0, γ − α) and that the pmf (3.E.32) is zero when x ∈ / {max(0, γ − α), max(0,
γ − α) + 1, . . . , min(β, γ)}. Obtain the mean and variance of X . Show that the ν-th
moment of X is
ν  
α+β−k 
 ν γ−k
ν
E {X } = [β]k α+β  . (3.E.33)
k=0
k γ

Here,
   
1 
k
ν i k
= (−1) (k − i)ν (3.E.34)
k k! i=0 i

is the Stirling number of the second kind, and α + β, β, and γ represent the size of
the group, number of ‘successes’, and number of trials, respectively.

−kx
Exercise 3.69 For a random variable X with pdf f (x) = ke −kx 2 , show that
  (1+e )
E X = 3k 2 , E X = 15k 4 , and m L = k = −m L , where the half means m +
2 π2 4 7π 4 + ln 2 −
L
and m −L are defined in (3.E.28).

Exercise 3.70 Obtain the mgf, expected value, and variance of a random variable Y
λn
with pdf f Y (x) = (n−1)! x n−1 e−λx u(x).
Exercise 3.71 A coin with probability p of head is tossed twice in one trial. Define
X n as

⎨ 1, if the outcome is head and then tail,
X n = −1, if the outcome is tail and then head, (3.E.35)

0, if the two outcomes are the same

based on the two outcomes from the n-th trial. Obtain the cdf and mean of Y =
min{n : n ≥ 1, X n = 1 or − 1}.
Exercise 3.72 Assume a cdf F such that F(x) = 0 for x < 0, F(x) < 1 for 0 ≤
x < ∞, and

1 − F(x + y)
= 1 − F(x) (3.E.36)
1 − F(y)
252 3 Random Variables

for 0 ≤ x < ∞ and 0 ≤ y < ∞. Show that there exists a positive number β satisfying
1 − F(x) = exp − β u(x).
x

Exercise 3.73 For a geometric random variable X , show

P(X > m + n|X > m) = P(X > n). (3.E.37)

 
Exercise 3.74 In the distribution b 10, 13 , at which value of k is P10 (k) the largest?
 
At which value of k is P11 (k) the largest in the distribution b 11, 21 ?

Exercise 3.75 The probability of a side effect from a flu shot is 0.005. When 1000
people get the flu shot, obtain the following probabilities and their approximate
values:
(1) The probability P01 that at most one person experiences the side effect.
(2) The probability P456 that four, five, or six persons experience the side effect.

Exercise 3.76 Show that the skewness and kurtosis18 of b(n, p) are √ 1−2 p and
np(1− p)
3(n−2)
n
+ 1
np(1− p)
, respectively, based on Definition 3.3.10.

Exercise 3.77 Consider a Poisson random variable X with rate λ.


 
(1) Show that E X 3 = λ + 3λ2 + λ3 , E X 4 = λ + 7λ2 + 6λ3 + λ4 , μ2 = μ3 =
λ, and μ4 = λ + 3λ2 .
(2) Obtain the coefficient of variation.
(3) Obtain the skewness and kurtosis, and compare them with those of normal dis-
tribution.

3.78 When X is an exponential random variable with parameter λ, show


Exercise √
that Y = 2σ 2 λX is a Rayleigh random variable with parameter σ.

Exercise 3.79 Consider the continuous cdf



⎨ 0, x ≤ 0; x
2
, 0 ≤ x ≤ 1;
F3 (x) = 1
, 1 ≤ x ≤ 2; 1
(x − 1), 2 ≤ x ≤ 3; (3.E.38)
⎩2 2
1, x ≥ 3.

   
Confirm that F3 F3−1 (u) = u, F3−1 (F3 (x)) ≤ x, and P F3−1 (F3 (X )) = X = 0
when X ∼ F3 . Sketch F3 (x), F3−1 (u) and F3−1 (F3 (x)).

Exercise 3.80 Consider the continuous cdf

18 Here, when n → ∞, p → 0, and np → λ, the skewness is √1 and the kurtosis is 3 + λ1 . In


λ
addition, when σ 2 = np(1 − p) → ∞, the skewness is 1
σ → 0 and the kurtosis is 3 + 1
σ2
→ 3.
Exercises 253

⎨ 0, x ≤ 0; 2x
3
, 0 ≤ x ≤ 1;
F4 (x) = 2
, 1 ≤ x ≤ 2; x
, 2 ≤ x ≤ 3; (3.E.39)
⎩3 3
1, x ≥ 3.

   
Confirm that F4 F4−1 (u) = u, F4−1 (F4 (x)) ≤ x, and P F4−1 (F4 (X )) = X = 0
when X ∼ F4 . Sketch F4 (x), F4−1 (u), and F4−1 (F4 (x)).

Exercise 3.81 When the pdf of X is

1 1
f X (x) = u(x)u(1 − x) + u(x − 1)u(3 − x)
2
⎧1 4
⎨ 2 , 0 ≤ x < 1,

= 1
, 1 ≤ x < 3, (3.E.40)


4
0, otherwise,


obtain and sketch the pdf of Y = X.

References

N. Balakrishnan, Handbook of the Logistic Distribution (Marcel Dekker, New York, 1992)
E.F. Beckenbach, R. Bellam, Inequalities (Springer, Berlin, 1965)
P.J. Bickel, K.A. Doksum, Mathematical Statistics (Holden-Day, San Francisco, 1977)
P.O. Börjesson, C.-E.W. Sundberg, Simple approximations of the error function Q(x) for commu-
nications applications. IEEE Trans. Commun. 27(3), 639–643 (Mar. 1979)
W. Feller, An Introduction to Probability Theory and Its Applications, 3rd edn., revised printing
(Wiley, New York, 1970)
W.A. Gardner, Introduction to Random Processes with Applications to Signals and Systems, 2nd
edn. (McGraw-Hill, New York, 1990)
B.R. Gelbaum, J.M.H. Olmsted, Counterexamples in Analysis (Holden-Day, San Francisco, 1964)
G.J. Hahn, S.S. Shapiro, Statistical Models in Engineering (Wiley, New York, 1967)
J. Hajek, Nonparametric Statistics (Holden-Day, San Francisco, 1969)
J. Hajek, Z. Sidak, P.K. Sen, Theory of Rank Tests, 2nd edn. (Academic, New York, 1999)
N.L. Johnson, S. Kotz, Distributions in Statistics: Continuous Univariate Distributions, vol. I, II
(Wiley, New York, 1970)
S.A. Kassam, Signal Detection in Non-Gaussian Noise (Springer, New York, 1988)
A.I. Khuri, Advanced Calculus with Applications in Statistics (Wiley, New York, 2003)
P. Komjath, V. Totik, Problems and Theorems in Classical Set Theory (Springer, New York, 2006)
A. Leon-Garcia, Probability, Statistics, and Random Processes for Electrical Engineering, 3rd edn.
(Prentice Hall, New York, 2008)
M. Loeve, Probability Theory, 4th edn. (Springer, New York, 1977)
E. Lukacs, Characteristic Functions, 2nd edn. (Griffin, London, 1970)
R.N. McDonough, A.D. Whalen, Detection of Signals in Noise, 2nd edn. (Academic, New York,
1995)
D. Middleton, An Introduction to Statistical Communication Theory (McGraw-Hill, New York,
1960)
A. Papoulis, The Fourier Integral and Its Applications (McGraw-Hill, New York, 1962)
254 3 Random Variables

A. Papoulis, S.U. Pillai, Probability, Random Variables, and Stochastic Processes, 4th edn.
(McGraw-Hill, New York, 2002)
S.R. Park, Y.H. Kim, S.C. Kim, I. Song, Fundamentals of Random Variables and Statistics (in
Korean) (Freedom Academy, Paju, 2017)
V.K. Rohatgi, A.KMd.E. Saleh, An Introduction to Probability and Statistics, 2nd edn. (Wiley, New
York, 2001)
J.P. Romano, A.F. Siegel, Counterexamples in Probability and Statistics (Chapman and Hall, New
York, 1986)
I. Song, J. Bae, S.Y. Kim, Advanced Theory of Signal Detection (Springer, Berlin, 2002)
J.M. Stoyanov, Counterexamples in Probability, 3rd edn. (Dover, New York, 2013)
A.A. Sveshnikov (ed.), Problems in Probability Theory, Mathematical Statistics and Theory of
Random Functions (Dover, New York, 1968)
J.B. Thomas, Introduction to Probability (Springer, New York, 1986)
R.D. Yates, D.J. Goodman, Probability and Stochastic Processes (Wiley, New York, 1999)
D. Zwillinger, S. Kokoska, CRC Standard Probability and Statistics Tables and Formulae (CRC,
New York, 1999)
Chapter 4
Random Vectors

We consider the concept and applications of random vectors in this chapter. In


describing the probabilistic properties of a random vector, we need to specify not
only the probabilistic properties of each of the element random variables, but also
the relationships among random variables.

4.1 Distributions of Random Vectors

In this section, we concentrate on the distributions (Abramowitz and Stegun 1972;


Kassam 1988; Mardia 1970; Song et al. 2002; Stuart and Ord 1987) of random
vectors. The notion of joint probability functions, i.e., joint cdf, joint pdf, and joint
pmf, are discussed to completely characterize a random vector.

4.1.1 Random Vectors

Definition 4.1.1 (random vector; continuous random vector; discrete random vec-
tor) A vector consisting of a number of random variables is called a random vector,
multi-dimensional random vector, multi-variate random variables, or joint random
variables. If the components of a random vector are all continuous random vari-
ables or all discrete random variables, then the random vector is called a continuous
random vector or a discrete random vector, respectively.

Conversely, a random variable is a one-dimensional random vector, and a random


process, also called a stochastic process, is a random vector of infinite dimension.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 255
I. Song et al., Probability and Random Variables: Theory and Applications,
https://doi.org/10.1007/978-3-030-97679-8_4
256 4 Random Vectors

Often, the terms random variable, random vector, and random process are used inter-
changeably.
When the size of a random vector is n, it is called an n-dimensional, n-variate,
or n-variable random vector. When some of the components of a random vector are
discrete random variables and some are continuous random variables, the random
vector is called a mixed-type or hybrid random vector. In this book, we mostly
consider only continuous and discrete random vectors.

Definition 4.1.2 (joint cdf) For a random vector X = (X 1 , X 2 , . . . , X n ), the func-


tion

FX (x) = P (X 1 ≤ x1 , X 2 ≤ x2 , . . . , X n ≤ xn ) , (4.1.1)

describing the probabilistic characteristics of X via the probability of the joint event
n
∩ {X i ≤ xi }, is called the joint cdf of X, where x = (x1 , x2 , . . . , xn ).
i=1

The joint cdf is often simply called the cdf.

Example 4.1.1 Letting X 1 and X 2 be the first and second numbers on the face of a
fair die from two rollings, X = (X 1 , X 2 ) is a bi-variate discrete random vector. ♦

Example 4.1.2 (Thomas 1986) A fair coin is tossed three times. Let X =
(X 1 , X 2 , X 3 ), where X i denotes the outcome from the i-th toss with 1 and 0 repre-
senting the head and tail, respectively. Then, the value of the discrete random vector
X is one of (0, 0, 0), (0, 0, 1), (0, 1, 0), (1, 0, 0),(1, 1, 0), (1, 0, 1), (0, 1, 1), and
3
(1, 1, 1). The joint cdf FX (x) = P ∩ {X i ≤ xi } of X is
i=1



⎪ 0, {x1 < 0} , {x2 < 0} , or {x3 < 0} ;




1
, {0 ≤ x1 < 1, 0 ≤ x2 < 1, 0 ≤ x3 < 1} ;


8


1
, {0 ≤ x1 < 1, 0 ≤ x2 < 1, x3 ≥ 1} ,


4
⎨ {0 ≤ x1 < 1, x2 ≥ 1, 0 ≤ x3 < 1} , or
FX (x) = {x1 ≥ 1, 0 ≤ x2 < 1, 0 ≤ x3 < 1} ; (4.1.2)




1
, {0 ≤ x1 < 1, x2 ≥ 1, x3 ≥ 1} ,


2

⎪ {x1 ≥ 1, 0 ≤ x2 < 1, x3 ≥ 1} , or



⎪ {x1 ≥ 1, x2 ≥ 1, 0 ≤ x3 < 1} ;

1, {x1 ≥ 1, x2 ≥ 1, x3 ≥ 1} ;

where x = (x1 , x2 , x3 ). ♦

Often, but not necessarily, the probability functions of a subvector of a random


vector are indicated with the term ‘marginal’ in front of the probability functions.
For example, for the random vector (X, Y ), the cdf, pdf, or pmf of X is called the
marginal cdf, marginal pdf, or marginal pmf of X , respectively.
4.1 Distributions of Random Vectors 257

Theorem 4.1.1 For a random vector X, the cdf of X i can be obtained as

FX i (xi ) = lim FX (x) (4.1.3)


x j →∞
j=i

from the joint cdf FX (x) of X.

Proof We prove the theorem in the bi-variate case because the proofs in other cases
are similar. Assume a sequence {yn }∞ n=1 increasing to infinity. Then, the events
{X 1 ≤ x, X 2 ≤ yn }∞
n=1 are increasing sets, and thus lim {X 1 ≤ x, X 2 ≤ yn } =
n→∞

∪ {X 1 ≤ x, X 2 ≤ yn } = {X 1 ≤ x, X 2 ≤ ∞} = {X 1 ≤ x}. This result implies
n=1
lim P (X 1 ≤ x, X 2 ≤ yn ) = P (X 1 ≤ x) from the continuity of probability. In other
n→∞
words, we have

FX 1 (x) = lim FX 1 ,X 2 (x, y), (4.1.4)


y→∞

which completes the proof. ♠



In general, for a subvector X s = X s1 , X s2 , . . . , X sm of X, we have

FX s (x s ) = lim FX (x), (4.1.5)


x j →∞
j ∈I
/ s


where Is = {s1 , s2 , . . . , sm } and x s = xs1 , xs2 , . . . , xsm .
 
Definition 4.1.3 (joint pdf) For a measurable space (Ω, F) = Rk , B Rk , a real
function f is called a (k-dimensional) joint pdf if it satisfies

f (x) ≥ 0, x ∈ Rk (4.1.6)

and

f (x)d x = 1, (4.1.7)
Rk

where d x = d x1 d x2 · · · d xk .

Often, a joint pdf is called simply a pdf if it does not incur any confusion. Consider
the set function

P(G) = f (x) d x (4.1.8)


x∈G
258 4 Random Vectors

for an event G ∈ B Rk . The set function P shown in (4.1.8) is a probability measure
and is an extension of (2.5.20), a relationship between the pdf and probability measure
in the one-dimensional space.
Example 4.1.3 When { f i }i=1
k
are one dimensional pdf’s, a pdf f (x) that can be
k
expressed as f (x) = f i (xi ) is called a product pdf. ♦
i=1

Theorem 4.1.2 For a random vector X = (X 1 , X 2 , . . . , X n ) with joint cdf FX (x),


the joint pdf can be obtained as

∂n
f X (x) = FX (x), (4.1.9)
∂x
where ∂ x = ∂ x1 ∂ x2 · · · ∂ xn .

Conversely, based on Definition 4.1.2 and Theorem 4.1.2, the joint cdf can be
obtained as
xn xn−1 x1
FX (x) = ··· f X (t) d t (4.1.10)
−∞ −∞ −∞

from the joint pdf, where t = (t1 , t2 , . . . , tn ) and d t = dt1 dt2 · · · dtn . The joint cdf
FX and joint pdf f X characterize the probabilistic properties of X.
The marginal pdf f X i (xi ) = ddxi FX i (xi ) of X i can be obtained as

∞ ∞ ∞ xi ∞ ∞ ∞
d
f X i (xi ) = ··· ··· f X (t) d t
d xi−∞ −∞ −∞ −∞ −∞ −∞ −∞
∞ ∞ ∞
= ··· f X (x)d x a d x b (4.1.11)
−∞ −∞ −∞

by differentiating the marginal cdf FX i (xi ) of X i after using (4.1.10) in (4.1.3),


where d x a = d x1 dx2 · · · d xi−1 and d x b = d xi+1 d xi+2 · · · d xn . More generally, for
a subvector X s = X s1 , X s2 , . . . , X sm of X, we have
⎧ ⎫

⎨ ⎪

∂ m
f X s (x s ) = lim FX (x) (4.1.12)
∂ xs ⎪
⎩x j →∞ ⎪

j ∈I
/ s

from (4.1.5), where Is = {s1 , s2 , . . . , sm } and ∂ x s = ∂ xs1 ∂ xs2 · · · ∂ xsm .


Definition 4.1.4 (joint pmf) A real-valued function p is called a k-dimensional joint
pmf if it satisfies

p(x) ≥ 0 (4.1.13)
4.1 Distributions of Random Vectors 259

and

p(x) = 1 (4.1.14)
x∈Ω k


for all x = (x1 , x2 , . . . , xk ) ∈ Ω k on a measurable space Ω k , F k , where Ω and F
are a discrete sample space and the corresponding event space, respectively.

Consider the set function



P(G) = p(x) (4.1.15)
x∈G

for an event G ∈ F k . The set function P shown in (4.1.15) is a probability measure,


and is an extension of (2.5.5), a relationship between the pmf and probability measure
in the one-dimensional space.

Example 4.1.4 Consider the sample space Ω k = Jkn+1 = {0, 1, . . . , n}k and k num-
 k 
k
bers α j ∈ (0, 1) j=1 such that α j = 1. Then,
j=1


⎨ 
k
n
α1x1 α2x2 · · · αkxk , xi = n, xi ∈ Jn+1 ,
p(x) = x1 ,x2 ,...,xk (4.1.16)
⎩ i=1
0, otherwise

is called the multinomial1 pmf, where



  ⎨ 
k
n n!
, n i = n,
= n 1 !n 2 !···n k ! (4.1.17)
n1, n2, . . . , nk ⎩ i=1
0, otherwise

is the multinomial coefficient discussed in (1.4.63). The multinomial pmf is a gen-


eralization of the binomial pmf, and denotes the binomial pmf when k = 2. ♦

Example 4.1.5 Consider the face number from a rolling of a fair die, and let
A = {1, 2, 3}, B = {4, 5}, and C = {6} as in Example 1.4.24. The probability
of the occurrence of four times of A, five times of B, and one time of C is
 10  1 4  1 5  1 1
4,5,1 2 3 6
= 648
35
≈ 0.054 from ten rollings. ♦

When X = (X 1 , X 2 , . . . , X n ) is an n-dimensional discrete random vector, the


joint cdf can be expressed as (4.1.1) and the joint pmf can be written as

p X (x) = P (X = x) . (4.1.18)

1 The multinomial pmf is discussed also in Appendix 4.1.


260 4 Random Vectors

The joint cdf (4.1.1) can be expressed as



FX (x) = p X (t) (4.1.19)
{t ≤ x}

in terms of the joint pmf (4.1.18), where {t ≤ x} denotes {t1 ≤ x1 , t2 ≤ x2 , . . . ,


tn ≤ xn }.

4.1.2 Bi-variate Random Vectors

We now consider two-dimensional random vectors in detail because they are used
more frequently than, and provide insights on, higher dimensional random vectors.
Let FX,Y (x, y) = P(X ≤ x, Y ≤ y) and f X,Y be the joint cdf and pdf, respectively,
of a two-dimensional random vector (X, Y ). Then, the joint pdf can be written as

∂2
f X,Y (x, y) = FX,Y (x, y) (4.1.20)
∂ x∂ y

in terms of the joint cdf FX,Y , and the joint cdf can be expressed as
y x
FX,Y (x, y) = f X,Y (u, v)dudv (4.1.21)
−∞ −∞

in terms of the joint pdf f X,Y . In addition, we can obtain the (marginal) cdf FX (x) =
P(X ≤ x) as

FX (x) = lim FX,Y (x, y) (4.1.22)


y→∞


from the joint cdf FX,Y , and the (marginal) pdf f X (x) = ∂x
FX (x) as

f X (x) = f X,Y (x, y)dy (4.1.23)
−∞

from the joint pdf f X,Y . In (4.1.22) and (4.1.23), interchanging the two random
variables X and Y , we have the cdf FY (y) = P(Y ≤ y) of Y as

FY (y) = lim FX,Y (x, y) (4.1.24)


x→∞

∞ ∂
and the pdf f Y (y) = −∞ f X,Y (x, y)d x of Y from f Y (y) = ∂y
FY (y).
4.1 Distributions of Random Vectors 261

The two-dimensional cdf FX,Y (x, y) has the following properties:

FX,Y (x + h, y + k) + FX,Y (x, y) ≥ FX,Y (x + h, y)


+FX,Y (x, y + k). (4.1.25)
FX,Y (x + h, y) ≥ FX,Y (x, y). (4.1.26)
FX,Y (x, y + k) ≥ FX,Y (x, y). (4.1.27)
FX,Y (−∞, y) = 0. (4.1.28)
FX,Y (x, −∞) = 0. (4.1.29)
FX,Y (∞, ∞) = 1. (4.1.30)

Here, h and k are non-negative constants. Properties (4.1.25)–(4.1.27) can be obtained


from

P(a < X ≤ b, c < Y ≤ d) = FX,Y (b, d) − FX,Y (a, d) − FX,Y (b, c)


+FX,Y (a, c)
≥0 (4.1.31)

for b ≥ a and d ≥ c, and imply that a joint cdf is a non-decreasing function. In


addition, (4.1.28) and (4.1.29) imply P (∅) = 0 described by (2.2.16), and (4.1.30)
implies P(Ω) = 1 discussed in (2.2.13).

Example 4.1.6 (Thomas 1986) Consider the function



1, x ≥ 0, y ≥ 0, x + y ≥ 1,
F(x, y) = (4.1.32)
0, otherwise.

The function F(x, y) is continuous from the right and non-decreasing. The val-
ues of F(x, y) for x, y → ±∞ satisfy the property of a cdf. However, F(x, y)
is not a cdf because it does not satisfy (4.1.25) or, equivalently, (4.1.31):
 for
example, we have F(b, d) − F(a, d) − F(b, c) + F(a, c) = F(1, 1) − F 13 , 1 −
 
F 1, 13 + F 13 , 13 = 1 − 1 − 1 + 0 = −1 when a = c = 13 and b = d = 1. ♦

Because a joint cdf F(x, y) is a non-decreasing function, its derivative, the joint
pdf f (x, y) satisfies f (x, y) ≥ 0 as described also in (4.1.6). In addition, from
(4.1.21)  ∞ F(∞, ∞) = 1 observed in (4.1.30), or as mentioned in (4.1.7), we
 ∞ and
have −∞ −∞ f (x, y)d xd y = 1.
For a discrete random vector (X, Y ), similar results can be obtained. First, the
joint pmf p X,Y (x, y) = P(X = x, Y = y), satisfying

 ∞

p X,Y (x, y) = 1, (4.1.33)
x=−∞ y=−∞
262 4 Random Vectors

denotes the probability of the intersection {X = x, Y = y} of the two events {X = x}


and {Y = y}. The joint cdf can be expressed as

FX,Y (x, y) = p X,Y (u, v) (4.1.34)
u≤x, v≤y

in terms of the joint pmf and, conversely, the joint pmf p X,Y can be expressed as

p X,Y (x, y) = FX,Y (x, y) + FX,Y (x − 1, y − 1)


−FX,Y (x − 1, y) − FX,Y (x, y − 1) (4.1.35)

in terms of the joint cdf FX,Y when the support of p X,Y (x, y) is a subset of the integer
lattice points {(x, y) : x, y ∈ J} in the two-dimensional space. In addition, from the
joint pmf p X,Y , the pmf p X (x) = P(X = x) of X can be obtained as


p X (x) = p X,Y (x, y) (4.1.36)
y=−∞



and the pmf of Y as pY (y) = P(Y = y) = p X,Y (x, y).
x=−∞

Example 4.1.7 When the joint pmf of (X, Y ) is


1
, (u, v) = (1, 1), (2, 4), (3, 9),
p X,Y (u, v) = 3 (4.1.37)
0, otherwise,

obtain the pmf of X .



Solution The pmf p X (x) = p X,Y (x, v) = p X,Y (x, 1) + p X,Y (x, 4) + p X,Y (x, 9)
v
of X can be obtained as

p X,Y (1, 1), x = 1; p X,Y (2, 4), x = 2;
p X (x) =
p X,Y (3, 9), x = 3; 0, otherwise
1
, x = 1, 2, 3,
= 3 (4.1.38)
0, otherwise

using (4.1.36). ♦
The probability that the random vector (X, Y ) will have a value in a region D is
obtained as
⎧ 

⎪ f X,Y (x, y)d xd y, continuous random vector,

⎨ D
d FX,Y (x, y) =   (4.1.39)


D ⎪
⎩ p X,Y (x, y), discrete random vector.
(x,y)∈D
4.1 Distributions of Random Vectors 263

r + dr
y + dy r

y θ + dθ

θ
x x + dx

Fig. 4.1 The differential areas dx dy and dr r dθ in the perpendicular and polar coordinate systems,
respectively

Example4.1.8 For  a random vector (X, Y) with the joint pdf f X,Y (x, y) =
x 2 +y 2
1
2πσ 2
exp − 2σ 2 , obtain the probability P X 2 + Y 2 ≤ a 2 that (X, Y ) will be
on or inside the circle of radius a and center at the origin.

Solution From (4.1.39), we get


 2 
 1 x + y2
P X +Y ≤a
2 2 2
= exp − d xd y. (4.1.40)
2π σ 2 2σ 2
x 2 +y 2 ≤a 2

  a  2π  2 
We then have P X 2 + Y 2 ≤ a2 = 1
2πσ 2 0 0 exp − 2σr 2 r dθ dr = − exp
 2 a

− 2σr 2  , i.e.,
r =0

 
 a2
P X +Y ≤a2 2 2
= 1 − exp − 2 (4.1.41)

by changing the perpendicular coordinate system (x, y) into the polar coordinate
system (r, θ ) as shown in Fig. 4.1. ♦

Example 4.1.9 The probability of a discrete random vector is sometimes called a


point mass. Let X be the face value from a rolling of a fair die and Y = 2X . Then,
the joint pmf pi j = p X,Y (i, j) = P(X = i, Y = j) of (X, Y ) is
1
, i = 1, 2, . . . , 6, j = 2i,
pi j = 6 (4.1.42)
0, otherwise.

Next, let U = |face number of the first die − face number of the second die| and
V = (face number of the first die + face number of the second die) from a rolling
of a pair of dice. Then, we have

1
P (U = 0, V = i) = , i = 2, 4, . . . , 12, (4.1.43)
36
264 4 Random Vectors

2
P (U = 1, V = i) = , i = 3, 5, . . . , 11, (4.1.44)
36
2
P (U = 2, V = i) = , i = 4, 6, 8, 10, (4.1.45)
36
2
P (U = 3, V = i) = , i = 5, 7, 9, (4.1.46)
36
2
P (U = 4, V = i) = , i = 6, 8, (4.1.47)
36
2
P (U = 5, V = 7) = (4.1.48)
36
as the point mass of (U, V ). ♦
Example 4.1.10 The probability of a two dimensional hybrid random vector of
which one element is a discrete random variable and the other is a continuous ran-
dom variable is called a line mass. For example, when X is the face number from
a rolling of a die and Y is a real number chosen randomly in the interval (0, 1),
P (X = x, y1 ≤ Y ≤ y2 ) is a line mass. ♦
Example 4.1.11 Assume that we repeatedly roll a fair die until the number of even-
numbered outcomes is 10. Let N and X i denote the numbers of rolls and outcome i,
respectively, when the rolling ends. Obtain the pmf of X 2 and the pmf of X 1 .

Solution (1) Let us obtain the pmf of X 2 in two ways.




(Method 1) The pmf p2 (k) = P (X 2 = k) = P (X 2 = k, N = j) of X 2 can
j=10
be obtained2 as


 
p2 (k) = P (2 at the j-th rolling; among the remaining j − 1 rollings,
j=10
k − 1 times of 2, 10 − k times of 4 or 6, and j − 10 times of 1, 3, or 5)
+P (4 or 6 at the j-th rolling; among the remaining j − 1 rollings,
k times of 2, 9 − k times of 4 or 6, and j − 10 times of 1, 3, or 5)]

  k−1  10−k   j−10
 1 ( j − 1)! 1 1 1
=
j=10
6 (k − 1)!(10 − k)!( j − 10)! 6 3 2
 k  9−k   j−10 
1 ( j − 1)! 1 1 1
+
3 k!(9 − k)!( j − 10)! 6 3 2
 
1 1 1 1
= +
(k − 1)!(10 − k)! k!(9 − k)! 6k 310−k
∞
( j − 1)! 1
× (4.1.49)
j=10
( j − 10)! 2 j−10

2Here, the events ‘2 occurs k − 1 times’ for k = 0 and ‘4 or 6 occurs 9 − k times’ for k = 10 are
both empty events and thus have probability 0, which can be confirmed from (−1)! → ±∞.
4.1 Distributions of Random Vectors 265

for k = 0, 1, . . . , 10. We subsequently have


 k  10−k
1 2
p2 (k) = 10 Ck (4.1.50)
3 3



( j−1)! 1 

(q+9)! 9!
for k = 0, 1, . . . , 10 because ( j−10)! 2 j−10
= q!9! 2q
= 210 9! from
j=10 q=0

∞ 
r +x−1 Cx (1 − α)x = α −r shown in (2.5.16). Thus, X 2 ∼ b 10, 13 .
x=0
(Method 2) Among the 10 times of the occurrences of even numbers, 2 can
occur 0, 1, . . ., 10 times and the probability of 2 among {2, 4, 6} is 13 . Thus, the
probability that 2 will occur k times is
 k  10−k
1 2
p2 (k) = 10 Ck (4.1.51)
3 3

for k = 0, 1, . . . , 10.


(2) Similarly, recollecting r +x−1 Cx (1 − α)x = α −r shown in (2.5.16), we can
x=0


obtain the pmf p1 (k) = P (X 1 = k) = P (X 1 = k, N = j) of X 1 as
j=10+k



p1 (k) = P(an even number at the j-th rolling; among the remaining
j=10+k
j − 1 rollings, k times of 1, 9 times of even numbers, and
j − k − 10 times of 3 or 5)
∞  k  9   j−k−10
1 ( j − 1)! 1 1 1
= (4.1.52)
j=10+k
2 ( j − k − 10)!k!9! 6 2 3

for k = 0, 1, . . ., which can be rewritten as


 k  10
1 3
p1 (k) = k+9 C9 (4.1.53)
4 4



( j−1)! 1 k 1 9 1 j−k−10 

(i+k+9)!
1 k 1 10 1 i
from 1
2 ( j−k−10)!k!9! 6 2 3
= i!k!9! 6 2 3
=
j=10+k i=0


(i+k+9)!
1 i (k+9)! 1 1 2 −k−10 (k+9)! 1 1 1 3
= (k+9)!
k 10 k 10 k 10
i!(k+9)! 3 k!9! 6 2
= 3 k!9! 6 2 k!9! 4 4
.
i=0 
Thus, X 1 ∼ NB 10, 43 with the pmf (2.5.14).

266 4 Random Vectors

4.1.3 Independent Random Vectors

Definition 4.1.5 (independent random vector) When the joint probability function
of a random vector is the product of the marginal probability functions of the element
random variables, the random vector is called an independent random vector.
In other words, if the joint cdf FX , joint pdf f X , or joint pmf p X of a random
vector X = (X 1 , X 2 , . . . , X n ) can be expressed as


n
FX (x) = FX i (xi ) , (4.1.54)
i=1


n
f X (x) = f X i (xi ) , (continuous random vector), (4.1.55)
i=1

or


n
p X (x) = p X i (xi ) , (discrete random vector), (4.1.56)
i=1

respectively, for every real vector x, the random vector X is an independent random
vector. The random variables of an independent random vector are all independent of
each other (Burdick 1992; Davenport 1970; Dawid 1979; Geisser and Mantel 1962;
Gray and Davisson 2010; Papoulis and Pillai 2002; Wang 1979).
Example 4.1.12 Assume the joint pmf
1
, for (u, v) = (1, 1), (2, 4), (3, 9),
p X,Y (u, v) = 3 (4.1.57)
0, otherwise

of (X, Y ) 
discussed in Example 4.1.7. Then, using (4.1.36), we can obtain the pmf
pY (w) = p X,Y (u, w) = p X,Y (1, w) + p X,Y (2, w) + p X,Y (3, w) of Y as
u


p X,Y (1, 1), w = 1; p X,Y (2, 4), w = 4;
pY (w) =
p X,Y (3, 9), w = 9; 0, otherwise
1
, w = 1, 4, or 9,
= 3 (4.1.58)
0, otherwise.

It is clear that p X,Y (u, w) = p X (u) pY (w) from (4.1.38) and (4.1.58). Thus, (X, Y )
is not an independent random vector. ♦
Example 4.1.13 A needle of length 2a is tossed at random on a plane ruled with
infinitely many parallel lines of an infinite length, where the distance between adja-
4.1 Distributions of Random Vectors 267

Fig. 4.2 Buffon’s needle


2b
when a ≤ b

2a
Θ

Fig. 4.3 Buffon’s needle


2b
when a ≥ b, where
θT = sin−1 ab

2a

θT

cent lines is 2b as shown in Figs. 4.2 and 4.3. Assuming that the thickness of the
needle is negligible, find the probability PB that the needle touches one or more of
the parallel lines when a ≤ b.

Solution Let us denote by X the distance from the center of the needle to the near-
est parallel line and by Θ the smaller angle that the needle makes with the lines.
Then, X ∼ U [0, b), Θ ∼ U [0, π2 ), and X and Θ are independent. Thus, the joint
pdf f X,Θ (x, θ ) = f X (x) f Θ (θ ) of (X, Θ) can be expressed as

2 π
f X,Θ (x, θ ) = , 0 ≤ x < b, 0 ≤ θ < . (4.1.59)
πb 2

Now, when a ≤ b, recollecting that {(x, θ ) : x ≤ a sin θ } = (x, θ ) : 0 ≤ θ < π2 ,
 π2
0 ≤ x ≤ a sin θ }, we get3 PB = P((X, Θ) : X < a sin Θ) = πb 2
0 a sin θ dθ as

2a
PB = . (4.1.60)
πb
The probability (4.1.60) is proportional to the length of the needle and inversely
proportional to the interval of the parallel lines, which is also appealing intuitively.
When a → 0 or b → ∞, we have PB → 0. ♦

 
3 {(x, θ) : x < a sin θ} = (x, θ) : 0 ≤ x ≤ a, sin−1 ax ≤ θ < π2 , the result
Considering
a  π a 
(4.1.60) can be obtained also as PB = π2b 0 sin2 −1 x dθd x = π2b 0 π2 − sin−1 ax d x =
 a −1 x  π2
a

2
b − π b 0 sin a d x = b − π b 0 t cos tdt = b − π b (t sin t + cos t) t=0 = π b .
a 2 a 2a a 2a 2a
268 4 Random Vectors

Fig. 4.4 Buffon’s needle: a


PB b
the probability PB ab that
the needle touches parallel 1
lines as a function of ab .
2
PB (2) ≈ 0.6366
 = √  π

3π 6 + π − 3 3 ≈
2

0.8372
0 1 2 a
b

Example 4.1.14 In Example 4.1.13, find the probability when a ≥ b.


 
Solution With the interval of integration (x, θ ) : sin−1 ab ≤ θ < π2 , 0 ≤ x ≤ b ∪
   π2 b
(x, θ ) : 0 ≤ θ ≤ sin−1 ab , 0 ≤ x ≤ a sin θ in mind, PB = πb 2
sin−1 ab 0
dx
 sin−1 ab  a sin θ    sin −1 b
dθ + 0 0 d x dθ = π2 π2 − sin−1 ab + πb
2a
0
a
sin θ dθ , i.e.,
  
2 b
PB = 1 + a − a 2 − b2 − b sin−1 . (4.1.61)
πb a
 b  π2 b
The same result can be obtained also as PB = πb 2
0 sin−1 ax dθ d x = πb 0
2

π  −1 b sin−1 b
2a sin a
− sin−1 ax d x = 1 − πb 0 t cos tdt = 1 − πb
2a
(t sin t + cos t)t=0 a = 1 −
2 √ 
sin−1 ab + πb − 2 aπb−b with the interval of integration (x, θ ) : 0 ≤ x ≤
2 2a 2 2
π 
b, sin−1 ax ≤ θ < π2 . ♦

It is easy to see that PB → π2 when a → b from (4.1.60) and (4.1.61) and that
PB → 1 when b is finite and a → ∞ from (4.1.61). Figure 4.4 shows the probability
PB as a function of ab .

Definition 4.1.6 (independent and identically distributed random vectors) An inde-


pendent random vector is called an independent and identically distributed (i.i.d.)
random vector when the marginal distributions of all the element random variables
are identical.

For an i.i.d. random vector X = (X 1 , X 2 , . . . , X n ), we have the cdf


n
FX (x) = FX (xi ) , (4.1.62)
i=1

the pdf


n
f X (x) = f X (xi ) (4.1.63)
i=1
4.1 Distributions of Random Vectors 269

when X is a continuous random vector, or the pmf


n
p X (x) = p X (xi ) (4.1.64)
i=1

when X is a discrete random vector for every real-valued vector x.

Example 4.1.15 Let X = (X 1 , X 2 ) be the pair of face numbers from a rolling of


a pair of fair dice. Then, X is an i.i.d. random vector. Here, we have p X (x1 , x2 ) =
p X 1 (x1 ) p X 2 (x2 ) for every value of x1 and x2 , and p X 1 (xi ) = p X 2 (xi ) = p X (xi ) for
every value of xi . In addition, p X (xi ) = 16 for xi = 1, 2, . . . , 6 and 0 otherwise. ♦

Definition 4.1.7 (two random vectors independent of each other) Consider two ran-
dom vectors X = (X 1 , X 2 , . . . ,X n ) and Y = (Y1 , Y2 , . . . , Ym). When the joint cdf
FX d ,Y d of two subvectors X d = X t1 , X t2 , . . . , X tk and Y d = Ys1 , Ys2 , . . . , Ysl sat-
isfies
  
FX d ,Y d x d , yd = FX d x d FY d yd (4.1.65)

for all natural numbers k = 1, 2, . . . , n, l = 1, 2, . . . , m, t1 , t2 , . . ., tk , s1 , s2 , . . ., and


sl , the two random vectors X and Y are called independent of each other, where
x d = (x1 , x2 , . . . , xk ) and yd = (y1 , y2 , . . . , yl ).

If the random vectors X = (X 1 , X 2 , . . . , X n ) and Y = (Y1 , Y2 , . . . , Ym ) are inde-


pendent of each other, then
  
f X d ,Y d x d , yd = f X d x d f Y d yd (4.1.66)

when X and Y are continuous random vectors and


  
p X d ,Y d x d , yd = p X d x d pY d yd (4.1.67)

when X and Y are discrete random vectors, for all subvectors X d of X and Y d of Y .
We can similarly define the independence among several random vectors by gen-
eralizing Definition 4.1.7. Note that the independence between X and Y has nothing
to do with if X or Y is an independent random vector. In other words, even when X
and Y are independent of each other, X or Y may or may not be independent random
vectors, and even when X and Y are both independent random vectors, X and Y may
or may not be independent of each other.

Theorem 4.1.3 Assume the two random vectors X 1 = (X 1 , X 2 , . . . , X m ) and X 2 =


(X m+1 , X m+2 , . . . , X m+n ) are independent of each other, and the elements of the two
vectors g(·) = (g1 (·), g2 (·), . . . , gm (·)) and h(·) = (h 1 (·), h 2 (·), . . . , h n (·)) are all
continuous functions. Then, the two random vectors Y 1 = g (X 1 ) and Y 2 = h (X 2 )
are independent of each other.
270 4 Random Vectors

Proof For convenience, we show the result for the simplest case of m = n =
1. Let A y1 and B y2 be the inverse images of {Y1 ≤ y1 } for Y1 = g (X 1 ) and
{Y2 ≤ y2 } for Y2 = h (X 2 ), respectively.
  we have the joint cdf FY1 ,Y2 (y1 , y2 ) =
Then,
P X 1 ∈ A y1 , X 2 ∈ B y2 =P X 1 ∈ A y1 P X 2 ∈ B y2 = FY1 (y1 ) FY2 (y2 ) because
P (Y1 ≤ y1 , Y2 ≤ y2 ) = P X 1 ∈ A y1 , X 2 ∈ B y2 . ♠

Example 4.1.16 (Stoyanov 2013) We can easily show that, if g(X ) and h(Y ) are
independent of each other and g and h are both one-to-one correspondences, then
X and Y are also independent of each other. On the other hand, the converse of
Theorem 4.1.3 does not hold true in general. In other words, when g or h is a
continuous function but is not a one-to-one correspondence, X and Y may not be
independent of each other even when g(X ) and h(Y ) are independent of each other.
Let us consider one such example of discrete random variables.
Assume the joint pmf

3 5 3
p X,Y (−1, −1) = , p X,Y (0, −1) = , p X,Y (1, −1) = ,
32 32 32
5 8 3
p X,Y (−1, 0) = , p X,Y (0, 0) = , p X,Y (1, 0) = , (4.1.68)
32 32 32
1 3 1
p X,Y (−1, 1) = , p X,Y (0, 1) = , p X,Y (1, 1) =
32 32 32
of (X, Y ). Then, p X (−1) = p X,Y (−1, −1) + p X,Y (−1, 0) + p X,Y (−1, 1) =
9
32
, and similarly p X (0) = 21 and p X (1) = 32 7
. In addition, pY (−1) =
p X,Y (−1, −1) + p X,Y (0, −1) + p X,Y (1, −1) = 32 , pY (0) = 21 , and pY (1) = 32
11 5
.
Thus, p X,Y (−1, −1) = 32 = 322 = p X (−1) pY (−1) and X and Y are not indepen-
3 99

dent of each other.


Now, consider g(X ) = X 2 and h(Y ) = Y 2 , Then, p X 2 ,Y 2 (0, 0) = p X,Y (0, 0) =
1
4
, p X 2 ,Y 2 (0, 1) = p X,Y (0, −1) + p X,Y (0, 1) = 14 , p X 2 ,Y 2 (1, 0) = p X,Y (−1, 0) +
p X,Y (1, 0) = 41 , and p X 2 ,Y 2 (1, 1) = p X,Y (−1, −1) + p X,Y (−1, 1) + p X,Y (1, −1) +
p X,Y (1, 1) = 41 . In other words, p X 2 (0) = p X 2 ,Y 2 (0, 0) + p X 2 ,Y 2 (0, 1) = 21 . Simi-
larly, we get p X 2 (1) = 21 , pY 2 (0) = 21 , and pY 2 (1) = 21 . Thus,

p X 2 ,Y 2 (i, j) = p X 2 (i) pY 2 ( j) (4.1.69)

for i, j = 0, 1, implying that X 2 and Y 2 are independent of each other. ♦

4.2 Distributions of Functions of Random Vectors

In this section, we will focus on the distributions of functions (Horn and Johnson
1985; Leon-Garcia 2008) of random vectors in terms of the cdf, pdf, and pmf.
4.2 Distributions of Functions of Random Vectors 271

4.2.1 Joint Probability Density Function

For the transformation Y = g (X), if the region |d x| of X is mapped to the region


|d y| of Y , we have

f Y ( y) |d y| = f X (x) |d x| . (4.2.1)

In other words, the probability f X (x) |d x| that X is in |d x| will be the same as the
probability f Y ( y) |d y| that Y is in |d y|, a type of conservation laws.

4.2.1.1 General Formula

Based on (4.2.1), the pdf f Y for the function Y = (Y1 , Y2 , . . . , Ym ) = g (X) of X =


(X 1 , X 2 , . . . , X n ) for m = n can be obtained as described in the following theorem:

Theorem 4.2.1
 ∂ Denote the Jacobian4 of g(x) = (g1 (x), g2 (x), . . . , gn (x)) by
J (g(x)) =  ∂ x g(x), i.e.,
 ∂g ∂g 
 1 1 · · · ∂g1 
 ∂ x1 ∂ x2 ∂ xn 
 ∂g2 ∂g2 ∂g 
 ∂ x1 ∂ x2 · · · ∂ xn2 

J (g(x)) =  . . . .  . (4.2.2)
 .. .. . . .. 
 ∂gn ∂gn 

∂ x1 ∂ x2
· · · ∂∂gxnn 

Then, the joint pdf of Y = (Y1 , Y2 , . . . , Yn ) = g (X) = (g1 (X) , g2 (X) , . . . ,


gn (X)) for a random vector X = (X 1 , X 2 , . . . , X n ) can be obtained as

 
f X (x) 
Q
f Y ( y) = , (4.2.3)
j=1
|J (g(x))| 
x=x j ( y)

 Q   Q
j j j
where x j ( y) j=1
= x1 ( y), x2 ( y), . . . , xn ( y) are the solutions to the
j=1
simultaneous equations g1 (x) = y1 , g2 (x) = y2 , . . . , gn (x) = yn , i.e., {xi }i=1
n

expressed in terms of {yi }i=1 .


n

4 The Jacobian is also referred to as the transformation Jacobian or Jacobian determinant. In addition,
 ∂g 
 1 ∂g2 · · · ∂gn 
 ∂ x1 ∂ x1 ∂ x1 
 ∂g1 ∂g2 · · · ∂gn 
 ∂ x2 ∂ x2 ∂ x2 
from the property of determinant, we also have J (g(x)) =  . .. . . .. .
 . . . . 
 ∂g ∂g.
 1 2
· · · ∂gn 
∂ xn ∂ xn ∂ xn
272 4 Random Vectors

Example 4.2.1 When the joint pdf of X = (X 1 , X 2 ) is

f X (x1 , x2 ) = u (x1 ) u (1 − x1 ) u (x2 ) u (1 − x2 ) , (4.2.4)

obtain the joint pdf f Y of Y = (Y1 , Y2 ) = (3X 1 − X 2 , −X 1 + X 2 ). Next, based on


the joint pdf f Y , obtain the pdf’s of Y1 = 3X 1 − X 2 and Y2 = −X 1 + X 2 .

Solution The Jacobian of the transformation  g (x1 , x2 ) = (3x1 − x2 , −x1 + x2 ) is


 
 ∂ g(x1 ,x2 )   3 −1 
J (g (x1 , x2 )) =  ∂(x1 ,x2 )  =  = 2. In addition, solving the simultaneous
−1 1 
equations y1 = 3x1 − x2 and y2 = −x1 + x2 , we get x1 = y21 + y22 and x2 = y21 + 3y22
and, consequently, we have Q = 1. Therefore, the joint pdf f Y (y1 , y2 ) of Y can be
obtained as

1 
f Y (y1 , y2 ) = f X (x1 , x2 )  
|2| y y y 3y
(x1 ,x2 )= 21 + 22 , 21 + 22
 
1  y1 y2   y1 y2  y1 3y2
= u + u 1− − u +
2 2 2 2 2 2 2
 
y1 3y2
×u 1 − −
2 2
1
= u (y1 + y2 ) u (2 − y1 − y2 ) u (y1 + 3y2 ) u (2 − y1 − 3y2 ) . (4.2.5)
2
Figure 4.5 shows the support VX of the pdf f X (x1 , x2 ) and also the support

VY = {(y1 , y2 ) : 0 < y1 + y2 < 2, 0 < y1 + 3y2 < 2} (4.2.6)

of the pdf f Y (y1 , y2 ). Here, f Y (y1 , y2 ) = 21 for (y1 , y2 ) ∈ VY and f Y (y1 , y2 ) = 0


otherwise. It is also easy to see that the area of VY is 2. Now, keeping in mind the
support VY of the joint pdf f Y (y1 , y2 ) shown in Fig. 4.5 for integration, we can obtain
the marginal pdf

f Y1 (y1 ) = f Y (y1 , y2 ) dy2
−∞
⎧  − 1 (y −2)


1 3 1
dy2 = 13 (y1 + 1) , −1 < y1 ≤ 0,

⎪ 21 −y 1
⎨ − 13 (y1 −2)
dy2 = 13 , 0 < y1 ≤ 2,
= 2 − 31
y
(4.2.7)

⎪ 1 2−y1
dy2 = 1 − y31 , 2 < y1 ≤ 3,

⎪ y1
⎩2 −3
0, otherwise

of Y1 and the marginal pdf


4.2 Distributions of Functions of Random Vectors 273

x2 y2
1
1

VY 2 3
−1 0 y1
VX

−1
0 1 x1

Fig. 4.5 Supports VX and VY of the joint pdf’s f X (x1 , x2 ) and f Y (y1 , y2 ), respectively, when the
transformation is (Y1 , Y2 ) = (3X 1 − X 2 , −X 1 + X 2 )

fY1 (y1 ) fY2 (y2 )

1
3

y2
−1 1 2 3 y1 −1 1

Fig. 4.6 The pdf’s f Y1 (y1 ) of Y1 = 3X 1 − X 2 and f Y2 (y2 ) of Y2 = −X 1 + X 2 when the joint pdf
of X = (X 1 , X 2 ) is f X (x1 , x2 ) = u (x1 ) u (1 − x1 ) u (x2 ) u (1 − x2 )

⎧  2−y

⎨ 2 −3y2 dy1 = 1 + y2 , −1 < y2 ≤ 0,
1 2

−3y2 +2
f Y2 (y2 ) = 1 −y dy1 = 1 − y2 , 0 < y2 ≤ 1, (4.2.8)

⎩2 2
0, otherwise

of Y2 by integrating the joint pdf f Y (y1 , y2 ). Figure 4.6 shows the pdf’s f Y1 (y1 ) and
f Y2 (y2 ). ♦

When m < n, the joint pdf of Y can be obtained from Theorem 4.2.1 by employing
auxiliary variables: details will be discussed in Sect. 4.2.2. It is not possible to obtain
the joint pdf of g(X) for m > n except in very special cases as discussed in Sect.
4.5. When J (g(x)) = 0 and m = n, we cannot use Theorem 4.2.1 in obtaining the
joint pdf of g(X) because the denominator in (4.2.3) is 0: in Sect. 4.5, we briefly
discuss with some examples on how we can deal with such cases.
274 4 Random Vectors

4.2.1.2 One-to-One Transformations



Theorem 4.2.2 When the inverse function g −1 = g1−1 , g2−1 , . . . , gn−1 of g exists
and is differentiable, we have Q = 1 and (4.2.3) can be written as
   
f Y ( y) =  J g −1 ( y)  f X g −1 ( y) , (4.2.9)
  
 
where J g −1 ( y) =  ∂∂y g −1 ( y) is the Jacobian of g −1 ( y).


Note that we have J g −1 ( y) = J (g(x)) 1
. The Jacobian J (g(x)) of g(x) is written
   −1  
∂y  
also as J ( y) or  ∂ x , and the Jacobian J g ( y) is expressed also as J (x) or  ∂∂ xy .

Example 4.2.2 Consider


! the linear transformation Y = AX of X = (X 1 , X 2 )T ,
ab 
where A = . Then, if ad − bc = 0, we have f Y ( y) = | det1 A| f X A−1 y from
cd
(4.2.9). More generally, when A is an n × n matrix and b is an n × 1 vector, the pdf
of Y = AX + b is
1 
f Y ( y) = f X A−1 ( y − b) (4.2.10)
| det A|

if det A = | A| = 0. ♦
Example 4.2.3 When X ∼ G (α1 , β) and Y ∼ G (α2 , β) are independent of each
other, show that Z = X + Y and S = X +Y
X
are independent of each other, and obtain
the pdf of S.
Solution Expressing X and Y as X = Z S and Y = Z − Z S in terms of Z and S,
  
 −1  ∂ −1   s z 
we have the Jacobian J g (z, s) =  ∂(z,s) g (z, s) =  = −z of the
1 − s −z 
−1
inverse transformation (X, Y ) = g (Z , S) = (Z S, Z − Z S). Thus, the joint pdf
f Z ,S (z, s) = |z| f X,Y (x, y)x=zs, y=z−zs of (Z , S) is
   
|z|(zs)α1 −1 (z − zs)α2 −1 zs z − zs
f Z ,S (z, s) = exp − exp −
β α1 +α2 Γ (α1 ) Γ (α2 ) β β
× u(zs)u(z − zs)
   
z α1 +α2 −1 z
= exp − u(z)
β α1 +α2 Γ (α1 + α2 ) β
 
Γ (α1 + α2 ) α1 −1
× s (1 − s)α2 −1 u(1 − s)u(s) . (4.2.11)
Γ (α1 ) Γ (α2 )

This result dictates that Z = X + Y and S = X +Y X


are independent of each other,
 
z α1 +α2 −1
the pdf of Z is f Z (z) = β α1 +α2 Γ (α1 +α2 ) exp − β u(z), and the pdf of S is f S (s) =
1 z

Γ (α1 +α2 ) α1 −1
Γ (α1 )Γ (α2 )
s (1 − s)α2 −1 u(1 − s)u(s). Note that S = X
X +Y
∼ B (α1 , α2 ). ♦
4.2 Distributions of Functions of Random Vectors 275

4.2.2 Joint Probability Density Function: Method


of Auxiliary Variables

When m < n for a random vector X = (X 1 , X 2 , . . . , X n ) and its function


Y = g(X) = (Y1 , Y2 , . . . , Ym ), we first choose auxiliary variables as Ym+1 =
X m+1 , Ym+2 = X m+2 , . . ., Yn = X n . We next obtain the pdf f Y a of Y a =
(Y1 , Y2 , . . . , Ym , Ym+1 , Ym+2 , . . . , Yn ) using Theorem 4.2.1. Then, we obtain the pdf
of Y as
∞ ∞ ∞ 
f Y ( y) = ··· f Y a ya dym+1 dym+2 · · · dyn (4.2.12)
−∞ −∞ −∞

by integrating f Y a over the auxiliary variables, where y = (y1 , y2 , . . . , ym ) and ya =


(y1 , y2 , . . . , yn ).

Example 4.2.4 Obtain the pdf of Y = X 1 + X 2 for X = (X 1 , X 2 ) with the joint pdf
fX.

Solution Let Y1 = X 1 + X 2 , Y2 = X!2 , Y a = [Y1 Y2 ]T , and X!= [X 1 X 2 ]T . Then,


11 1 −1
we have Y a = AX, where A = . Because A−1 = , we get f Y a ( ya ) =
01 0 1

1
f A−1 ya = f X 1 ,X 2 (y1 − y2 , y2 ) based on Theorem 4.2.1. Integrating this
| det( A)| X
result, we have

f Y (y) = f X 1 ,X 2 (y − y2 , y2 ) dy2 . (4.2.13)
−∞

This example is discussed again in Example 4.2.13. ♦

When X 1 and X 2 are independent of each other, we have



f Y (y) = f X 1 (y − x) f X 2 (x)d x
−∞

= f X 1 (x) f X 2 (y − x)d x (4.2.14)
−∞

from (4.2.13). In other words, the pdf of the sum of two random variables independent
of each other is the convolution

f X +Y = f X ∗ f Y (4.2.15)

of the pdf’s of the two random variables.

Example 4.2.5 (Thomas 1986) Obtain the pdf of X 1 , the pdf of X 2 , and the pdf
of Y1 = X 1 + X 2 for a random vector (X 1 , X 2 ) with the joint pdf f X 1 ,X 2 (x1 , x2 ) =
24x1 (1 − x2 ) u (x1 ) u (x2 − x1 ) u (1 − x2 ).
276 4 Random Vectors

x2 y2
fY1 (y1 )
1 1 1
B
A

y1
x1 y1
1 1 2 0 1 2

Fig. 4.7 The region A = {(x1 , x2 ) : u (x1 ) u (x2 − x1 ) u (1 − x2 ) > 0} = (x1 , x2 ) : 0 ≤ x1 ≤
 
x2 , 0 ≤ x2 ≤ 1 and the corresponding region B = (y1 , y2 ) : u (y1 − y2 ) u (2y2 − y1 )
  
u (1 − y2 ) > 0 = (y1 , y2 ) : y2 ≤ y1 ≤ 2y2 , 0 ≤ y2 ≤ 1 of the transformation (y1 , y2 ) =
(x1 + x2 , x2 ). The pdf f Y1 (y1 ) of Y1 = X 1 + X 2 when the joint pdf of X 1 and X 2 is f X 1 ,X 2
(x1 , x2 ) = 24x1 (1 − x2 ) u (x1 ) u (x2 − x1 ) u (1 − x2 )

Solution First, the support of f X 1 ,X 2 (x1 , x2 ) is the region A shown in


Fig. 4.7. Recollecting that u (x2 − x1 ) u (1 − x2 ) is non-zero only on {(x1 , x2 ) : x1
< x2 < 1} = {x1 : x1 < 1} ∩ {x2 : x1 < x2 < 1}, the pdf of X 1 can be obtained as5

f X 1 (x1 ) = 12x1 (1 − x1 )2 u (x1 ) u (1 − x1 ) , (4.2.16)


∞ 1 
for which −∞ f X 1 (x1 ) d x1 = 0 12x1 (1 − x1 )2 d x1 = 12 21 − 23 + 41 = 1. Next,
recollecting that u (x1 ) u (x2 − x1 ) is non-zero only on {(x1 , x2 ) : 0 < x1
< x2 } = {x1 : 0 < x1 < x2 } ∩ {x2 : x2 > 0}, we can obtain6 the pdf of X 2 as

f X 2 (x2 ) = 24x1 (1 − x2 ) u (x1 ) u (x2 − x1 ) d x1 u (1 − x2 )
−∞
= 12x22 (1 − x2 ) u (x2 ) u (1 − x2 ) , (4.2.17)
∞ 1 
for which we clearly have −∞ f X 2 (x2 ) d x2 = 0 12 x22 − x23 d x2 = 1.
Next, let us obtain the pdf of Y1 = X 1 + X 2 . Choosing the auxiliary variable as
Y2 = X 2 , we have X 1 = Y1 − Y2 , X 2 = Y2 , and (X 1 , X 2 ) = g −1 (Y1 , Y2 ) = (Y1 −
 −1  
 1 ,y2 ) 
1 −1 
Y2 , Y2 ). The Jacobian of the inverse transformation is  ∂ g∂(y(y  =  = 1.
1 ,y2 )
 0 1 
Thus, the joint pdf of (Y1 , Y2 ) is f Y1 ,Y2 (y1 , y2 ) = f X 1 ,X 2 (x1 , x2 )x1 =y1 −y2 , x2 =y2 , i.e.,

f Y1 ,Y2 (y1 , y2 ) = 24 (y1 − y2 ) (1 − y2 ) u (y1 − y2 ) u (2y2 − y1 ) , (4.2.18)

5
∞ ∞
More specifically, we have f X 1 (x1 ) = −∞ f X 1 ,X 2 (x1 , x2 ) d x2 = −∞ 24x1 (1 − x2 ) u (x2 − x1 )
1
u (1 − x2 ) d x2 u (x1 ) = x 24x1 (1 − x2 ) d x2 u (x1 ) u (1 − x1 ) = 12x1 (1 − x1 )2 u (x1 ) u (1 − x1 ).
1 ∞ ∞
6 More specifically, we have f X 2 (x2 ) = f X 1 ,X 2 (x1 , x2 ) d x1 = −∞ 24x1 (1 − x2 ) u (x1 ) u (x2 − x1 )
 x2
−∞
d x1 u (1 − x2 ) = 0 24x1 (1 − x2 ) d x1 u (x2 ) u (1 − x2 ) = 12x2 (1 − x2 )u (x2 ) u (1 − x2 ).
2
4.2 Distributions of Functions of Random Vectors 277

of which the support is the region B shown in Fig. 4.7. Now, we can

obtain the pdf f Y1 (y1 ) = −∞ f Y1 ,Y2 (y1 , y2 ) dy2 of Y1 = X 1 + X 2 as f Y1 (y1 ) =
 y1 1 
1
y
24 (y1 − y2 ) (1 − y2 ) dy2 for 0 ≤ y1 ≤ 1 and f Y1 (y1 ) = 1 y1 24 (y1 − y2 ) 1 −
2 1 2

y2 dy2 for 1 ≤ y1 ≤ 2, i.e.,



⎨ −2y13 + 3y12 , 0 ≤ y1 ≤ 1,
f Y1 (y1 ) = 2y13 − 9y12 + 12y1 − 4, 1 ≤ y1 ≤ 2, (4.2.19)

0, otherwise
∞ 1 2
based7 on (4.2.18). Here, we have −∞ f Y1 (v)dv = 0 −2v 3 + 3v 2 dv + 1 2v 3
 "1  "2
−9v 2 + 12v − 4 dv = − 21 v 4 + v 3 0 + 2v 4 − 3v 3 + 6v 2 − 4v 1 = 1. The pdf
f Y1 (y1 ) is also shown in Fig. 4.7. ♦

The distribution of differences can be obtained similarly.

Example 4.2.6 Obtain the pdf of Y = X 1 − X 2 when the joint pdf of X = (X 1 , X 2 )


is f X .

Solution Following steps similar to those for (4.2.13), we get f X 1 −X 2 ,X 2 (y1 , y2 ) =


f X 1 ,X 2 (y1 + y2 , y2 ). We then have

f X 1 −X 2 (y) = f X 1 ,X 2 (y + y2 , y2 ) dy2 (4.2.20)
−∞

from integration. ♦

Example 4.2.7 When the joint pdf of X and Y is f X,Y (x, y) = 41 u(x)u(2 −
x)u(y)u(2 − y), obtain the pdf of V = X − Y .

Solution We first have


1
f X,Y (v + y, y) = u(v + y)u(2 − v − y)u(y)u(2 − y). (4.2.21)
4
Then, from (4.2.20), we get
⎧2 1
⎨ −v 4 dy, −2 < v < 0,
 −v+2 1
f V (v) = dy, 0 < v < 2, (4.2.22)
⎩ 0 4
0, otherwise,

7 Because u (y1 − y2 ) u (2y2 − y1 ) is non-zero only when {(y1 , y2 ) : y2 < y1 < 2y2 } =

{y1 : y2 < y1 < 2y2 } ∩ {y2 : y2 > 0}, we have f Y2 (y2 ) = −∞ 24 (y1 − y2 ) (1 − y2 )
 2y2
u (y1 − y2 ) u (2y2 − y1 ) u (1 − y2 ) dy1 = (1 − y2 ) y2 24 (y1 − y2 ) u (1 − y2 ) u (y2 ) dy1 =
 "2y
24 (1 − y2 ) 21 y12 − y2 y1 y 2 u (1 − y2 ) u (y2 ) = 12 (1 − y2 ) y22 u (1 − y2 ) u (y2 ), which is the
2
same as (4.2.17): note that we have chosen Y2 = X 2 .
278 4 Random Vectors

Fig. 4.8 The region of y


integration in obtaining the
pdf of V = X − Y when the y=2
joint pdf of (X, Y ) is
f X,Y (x, y) =
4 u(x)u(2 − x)u(y)u(2 − y)
1
y = −v + 2
y = −v

−2 0 y=0 2 v

i.e.,
⎧ v+2
⎨ 4 , −2 < v < 0,
f V (v) = − v−2 , 0 < v < 2, (4.2.23)
⎩ 4
0, otherwise,

for which Fig. 4.8 can be used to identify the upper and lower limits of the integration.

Example 4.2.8 Obtain the pdf of Z = X Y for a random vector (X, Y ) with the joint
pdf f X,Y .

Solution Let W = X . Then, X = W , Y = WZ , and the Jacobian of the inverse trans-


    ∂ x ∂ y 
 ∂ 
formation (X, Y ) = g (Z , W ) = W, W is  ∂(z,w) g (z, w) =  ∂∂zx ∂∂zy  = − w1 .
−1 Z −1

∂w ∂w
Thus, we have f Z ,W (z, w) = |w|
1
f X,Y (w, wz ). We can then obtain the pdf f Z (z) =
∞
−∞ f Z ,W (z, w)dw of Z = X Y as

∞  z 1
f Z (z) = f X,Y x, dx
−∞ x |x|
∞  
z 1
= f X,Y ,y dy (4.2.24)
−∞ y |y|

after integration. ♦

Example 4.2.9 When X 1 and X 2 are independent of each other with the identical
pdf f (x) = u(x)u(1 − x), obtain the pdf of Z = X 1 X 2 .

Solution The joint pdf of (X 1 , X 2 ) is f X 1 ,X 2 (x1 , x2 ) = f X 1 (x1 ) f X 2 (x2 ) =


u (x1 ) u (1 − x1 ) u (x2 ) u (1 − x2 ) because X 1 and X 2 are independent of each other.
Using (4.2.24), we then get
∞    
z z 1
f Z (z) = u u 1− u(y)u(1 − y) dy. (4.2.25)
−∞ y y |y|
4.2 Distributions of Functions of Random Vectors 279

Fig. 4.9 The pdf fZ (z)


f Z (z) = (− ln z) u(z)
u(1 − z) of Z = X 1 X 2 when
X 1 and X 2 are i.i.d. with the
marginal pdf
f (x) = u(x)u(1 − x)
1

0 1 z
e−1


Now, noting that the integral of (4.2.25) is non-zero only when (y, z) : 0 < y <

1, 0 < yz < 1 = {(y, z) : 0 < z < y < 1} = {(y, z) : z < y < 1, 0 < z < 1}, we
1
have f Z (z) = z 1y dyu(z)u(1 − z), i.e.,

f Z (z) = (− ln z) u(z)u(1 − z), (4.2.26)

which is shown in Fig. 4.9. ♦

Example 4.2.10 Obtain the pdf of Z = X


Y
when the joint pdf of X and Y is f X,Y .

Solution Let V = Y . Then, X = Z V , Y = V , and the Jacobian


    of the inverse trans-
−1  ∂(zv,v)   v 0 
formation (X, Y ) = g (Z , V ) = (Z V, V ) is  ∂(z,v)  =  = v. Thus, we have
z 1
∞
f Z ,V (z, v) = |v| f X,Y (zv, v). Finally, we get the pdf f Z (z) = −∞ f Z ,V (z, v)dv of
Z = YX as

f Z (z) = |y| f X,Y (zy, y)dy (4.2.27)
−∞

from integration. ♦

Example 4.2.11 Obtain the pdf of Z = YX when X and Y are i.i.d. with the marginal
pdf f (x) = u(x)u(1 − x).
∞
Solution We have f Z (z) = −∞ |y|u(zy)u(y)u(1 − zy)u(1 − y)dy from (4.2.27).
Next,
 noting that {(y, z) : zy> 0, y > 0, 1 − zy > 0, 1 − y > 0} is the same as
(y, z) : z > 0, 0 < y < min 1, 1z , the pdf of Z = YX can be obtained as f Z (z) =
 min(1, 1z )
0 y dy u(z), i.e.,

⎨ 0, z < 0,
f Z (z) = 21 , 0 < z ≤ 1, (4.2.28)
⎩ 1
2z 2
, z ≥ 1.
280 4 Random Vectors

Fig. 4.10 The pdf f Z (z) of fZ (z)


Z = YX when X and Y are
i.i.d. with the marginal pdf
1
f (x) = u(x)u(1 − x) 2

1
8

0 1 2 z

Note that the value f Z (0) is not, and does not need to be, specified. Figure 4.10 shows
the pdf (4.2.28) of Z = YX . ♦

4.2.3 Joint Cumulative Distribution Function

Theorem 4.2.1 is useful when we obtain the joint pdf f Y of Y = g (X) directly from
the joint pdf f X of X. In some cases, it is more convenient and easier to obtain the
joint pdf f Y after first obtaining the joint cdf FY ( y) = P (Y ≤ y) = P (g (X) ≤ y)
as

FY ( y) = P X ∈ A y . (4.2.29)

Here, Y ≤ y denotes {Y1 ≤ y1 , Y2 ≤ y2 , . . . , Yn ≤ yn }, and A y denotes


 the inverse
image of Y ≤ y, i.e., the region of X such that {Y ≤ y} = X ∈ A y . For example,
when n = 1, if g(x)
 isnon-decreasing  at every point x and has an inverse function,
−1
we have
 X ∈ A y = X ≤ g (y) as we observed in Chap. 3, and we get FY (y) =
FX g −1 (y) .

Example 4.2.12 When the joint pdf of (X, Y ) is


 2 
1 x + y2
f X,Y (x, y) = exp − , (4.2.30)
2π σ 2 2σ 2

obtain the joint pdf of Z = X 2 + Y 2 and W = Y
X
.

Solution The joint cdf FZ ,W (z, w) = P (Z ≤ z, W ≤ w) of (Z , W ) is


 
Y
FZ ,W (z, w) = P X 2 + Y 2 ≤ z, ≤ w
X
 2 
1 x + y2
= exp − d xd y (4.2.31)
2π σ 2 2σ 2
Dzw
4.2 Distributions of Functions of Random Vectors 281

for z ≥ 0, where D  zw is
 the union of the two fan shapes (x,  y) : x + y ≤
2 2

z , x > 0, y ≤ wx and (x, y) : x + y ≤ z , x < 0, y ≥ wx when w > 0 and


2 2 2 2

also when w < 0. Changing the integration in the perpendicular coordinate system
into that in the polar coordinate system as indicated in Fig. 4.1
 and2 noting the
 θw  z
symmetry of f X,Y , we get FZ ,W (z, w) = 2 θ=− π r =0 2πσ 2 exp − 2σr 2 r dr dθ =
1

 #  2 $z 2
π
1
πσ 2
θw + 2 −σ exp − 2σ 2
2 r
, i.e.,
r =0

  
1  z2
FZ ,W (z, w) = π + 2 tan−1 w 1 − exp − 2 , (4.2.32)
2π 2σ

where θw = tan−1 w ∈ − π2 , π2 . ♦

Recollect that we obtained the total probability theorems (2.4.13), (3.4.8), and
(3.4.9) based on P(A|B)P(B) = P(AB) derived from (2.4.1). Now, extending the
results into the multi-dimensional space, we similarly8 have
⎧
⎨ all x P(A|X = x) f X (x)d x, continuous random vector X,

P(A) =  (4.2.33)

⎩ P(A|X = x) p X (x), discrete random vectorX,
all x

which are useful in obtaining the cdf, pdf, and pmf in some cases.

Example 4.2.13 Obtain the pdf of Y = X 1 + X 2 when X = (X 1 , X 2 ) has the joint


pdf f X .

Solution This problem has already been discussed in Example 4.2.4 based on the
pdf. We now consider the problem based on the cdf.
(Method 1) Recollecting (4.2.33), the cdf FY (y) = P (X 1 + X 2 ≤ y) of Y can be
expressed as
∞ ∞
FY (y) = P ( X 1 + X 2 ≤ y| X 1 = x1 , X 2 = x2 )
−∞ −∞
f X (x1 , x2 ) d x1 d x2 . (4.2.34)

Here, { X 1 + X 2 ≤ y| X 1 = x1 , X 2 = x2 } does and does not hold true when x1 +


x2 ≤ y and x1 + x2 > y, respectively. Thus, we have

1, x1 + x2 ≤ y,
P ( X 1 + X 2 ≤ y| X 1 = x1 , X 2 = x2 ) = (4.2.35)
0, x1 + x2 > y

and the cdf of Y can be expressed as

8 Conditional distribution in random vectors will be discussed in Sect. 4.4 in more detail.
282 4 Random Vectors

Fig. 4.11 The region A = X2


{(X 1 , X 2 ) : X 1 + X 2 ≤ y}
and the interval
(−∞, y − x2 ) of integration y
for the value x1 of X 1 when y X1
the value of X 2 is x2 x1 = y − x2

x2 X1 + X2 = y

FY (y) = f X (x1 , x2 ) d x1 d x2
x1 +x2 ≤y
∞ y−x2
= f X (x1 , x2 ) d x1 d x2 . (4.2.36)
−∞ −∞


Then, the pdf f Y (y) = ∂y
FY (y) of Y is


f Y (y) = f X (y − x2 , x2 ) d x2 , (4.2.37)
−∞

∞
which can also be expressed as f Y (y) = −∞ f X (x1 , y − x1 ) d x1 . In obtaining
 y−x
(4.2.37), we used ∂∂y −∞ 2 f X (x1 , x2 ) d x1 = f X (y − x2 , x2 ) from Leibnitz’s rule
(3.2.18).
(Method 2) Referring to the region A = {(X 1 , X 2 ) : X 1 + X 2 ≤ y} shown in
Fig. 4.11, the value x1 of X 1 runs from −∞ to y − x2 when the value of x2 of
X 2 runs from −∞ to ∞. Thus we have FY (y) = P(Y ≤ y) = P (X 1 + X 2 ≤ y) =
9

f X (x1 , x2 ) d x1 d x2 , i.e.,
A

∞ y−x2
FY (y) = f X (x1 , x2 ) d x1 d x2 , (4.2.38)
x2 =−∞ x1 =−∞

and subsequently (4.2.37). ♦

Example 4.2.14 Obtain the pdf of Z = X + Y when X with the pdf f X (x) =
αe−αx u(x) and Y with the pdf f Y (y) = βe−βy u(y) are independent of each other.

Solution We first obtain the pdf of Z directly. The joint pdf of X and Y is
f X,Y (x, y) = f X (x) f Y (y) = αβe−αx e−βy u(x)u(y). Recollecting that u(y)u(z − y)

∞  y−x2
9If the order of integration is interchanged, then x 2 =−∞ x 1 =−∞ f X (x1 , x2 ) d x1 d x2 will become
∞  y−x1
x 1 =−∞ x 2 =−∞ f X (x 1 , x 2 ) d x 2 d x 1 .
4.2 Distributions of Functions of Random Vectors 283

 ∞ only when 0 < y < z, the pdf of Z = X +z Y can be obtained as


is non-zero
f Z (z) = −∞ αβe−α(z−y) e−βy u(y)u(z − y)dy = αβe−αz 0 e(α−β)y dyu(z), i.e.,
 αβ  −αz
e − e−βz u(z), β = α,
f Z (z) = β−α
2 −αz (4.2.39)
α ze u(z), β=α

from (4.2.37).
Next, the cdf of Z can be expressed as
∞ z−y
FZ (z) = αβ e−αx e−βy u(x)u(y) d xd y (4.2.40)
−∞ −∞

based on (4.2.38). Here, (4.2.40) is non-zero only when {x > 0, y > 0, z − y > 0}
due to u(x)u(y). With this fact in mind and by noting that {y > 0, z − y > 0} =
{z > y > 0}, we can rewrite (4.2.40) as
z z−y
FZ (z) = αβ e−αx e−βy d xd y u(z)
0 0
z 
=β 1 − e−α(z−y) e−βy dy u(z)
 0  −αz 
1 − β−α
1
βe − αe−βz u(z), β = α,
=   (4.2.41)
1 − (1 + αz) e−αz u(z), β = α.

By differentiating this cdf, we can obtain the pdf (4.2.39). ♦

Example 4.2.15 For a continuous random vector (X, Y ), let Z = max(X, Y ) and
W = min(X, Y ). Referring to Fig. 4.12, we first have FZ (z) = P(max(X, Y ) ≤ z) =
P(X ≤ z, Y ≤ z), i.e.,

FZ (z) = FX,Y (z, z). (4.2.42)

Fig. 4.12 The region


Y
{(X, Y ) : max(X, Y ) ≤ z}

z X
284 4 Random Vectors

Fig. 4.13 The region Y


{(X, Y ) : min(X, Y ) ≤ w}

w X

Next, when A = {X ≤ w} and B = {Y ≤ w}, we have


P(X > w, Y > w) = 1 − FX (w) − FY (w) + FX,Y (w, w) (4.2.43)

from P(X > w, Y > w) = P (Ac ∩ B c ) = P ((A ∪ B)c ) = 1 − P(A ∪ B) =


1 − P(A) − P(B) + P(A ∩ B), P(A) = P(X ≤ w) = FX (w), P(B) = P(Y ≤
w) = FY (w), and P(A ∩ B) = P(X ≤ w, Y ≤ w) = FX,Y (w, w). Therefore, we
get the cdf FW (w) = P(W ≤ w) = 1 − P(W > w) = 1 − P(min(X, Y ) > w) =
1 − P(X > w, Y > w) of W as
FW (w) = FX (w) + FY (w) − FX,Y (w, w), (4.2.44)

which can also be obtained intuitively from Fig. 4.13. Note that the pdf f Z (z) =
d
F (z, z) of Z = max(X, Y ) becomes
dz X,Y

f Z (z) = 2F(z) f (z) (4.2.45)


 
and the pdf f W (w) = d
dw
FX (w) + FY (w) − FX,Y (w, w) of W = min(X, Y )
becomes

f W (w) = 2 {1 − F(w)} f (w) (4.2.46)

when X and Y are i.i.d. with the marginal cdf F and marginal pdf f . ♦

The generalization of Z = max(X, Y ) and W = min(X, Y ) discussed in Exam-


ple 4.2.15 and Exercise 4.31 is referred to as the order statistic (David and Nagaraja
2003).

4.2.4 Functions of Discrete Random Vectors

Considering that the pmf, unlike the pdf, represents a probability, we now discuss
functions of discrete random vectors.
4.2 Distributions of Functions of Random Vectors 285

Example 4.2.16 (Rohatgi and Saleh 2001) Obtain the pmf of Z = X + Y and the
pmf of W = X − Y when X ∼ b(n, p) and Y ∼ b(n, p) are independent of each
other.
n
Solution First, the pmf P(Z = z) = P(X = k, Y = z − k) of Z = X + Y
k=0

n
can be obtained as P(Z = z) = n Ck p
k
(1 − p)n−k n Cz−k p z−k (1 − p)n−z+k =
k=0

n
n Ck n Cz−k p
z
(1 − p)2n−z , i.e.,
k=0

P(Z = z) = 2n Cz p
z
(1 − p)2n−z (4.2.47)


n
for z = 0, 1, . . . , 2n, where we have used n Ck n Cz−k = 2n Cz based on (1.A.25).
k=0

n 
n
Next, the pmf P(W = w) = P(X = k + w, Y = k) = n Ck+w n Ck p
2k+w
(1 −
k=0 k=0
p)2n−2k−w of W = X − Y can be obtained as
 w 
n
p
P(W = w) = n Ck+w n Ck p
2k
(1 − p)2n−2k (4.2.48)
1− p k=0

for w = −n, −n + 1, . . . , n. ♦
Example 4.2.17 Assume that X and Y are i.i.d. with the marginal pmf p(x) =
(1 − α)α x−1 ũ(x − 1), where 0 < α < 1 and ũ(x) is the discrete space unit step
function defined in (1.4.17). Obtain the joint pmf of (X + Y, X ), and based on the
result, obtain the pmf of X and the pmf of X + Y .
Solution First we have p X +Y,X (v, x) = P(X + Y = v, X = x) = P(X = x, Y =
v − x), i.e.,

p X +Y,X (v, x) = (1 − α)2 α v−2 ũ(x − 1)ũ(v − x − 1). (4.2.49)



Thus, we have p X +Y (v) = p X +Y,X (v, x), i.e.,
x=−∞



p X +Y (v) = (1 − α)2 α v−2 ũ(x − 1)ũ(v − x − 1) (4.2.50)
x=−∞

from (4.2.49). Now noting that ũ(x − 1)ũ(v − x − 1) = 1 for {x − 1 ≥ 0, v − x −


1 ≥ 0} and 0 otherwise and that10 {x : x − 1 ≥ 0, v − x − 1 ≥ 0} = {x : 1 ≤ x ≤

v−1
v − 1, v ≥ 2}, we have p X +Y (v) = (1 − α)2 α v−2 ũ(v − 2), i.e.,
x=1

10 Here, v − x − 1 ≥ 0, for example, can more specifically be written as v − x − 1 = 0, 1, . . ..


286 4 Random Vectors

p X +Y (v) = (1 − α)2 (v − 1)α v−2 ũ(v − 2). (4.2.51)



Next, the pmf of X can be obtained as p X (x) = p X +Y,X (v, x), i.e.,
v=−∞



p X (x) = (1 − α)2 α −2 α v ũ(x − 1)ũ(v − x − 1) (4.2.52)
v=−∞



from (4.2.49), which can be rewritten as p X (x) = (1 − α)2 α −2 α v ũ(x − 1),
v=x+1
i.e.,

p X (x) = (1 − α)α x−1 ũ(x − 1) (4.2.53)

by noting that {v : x − 1 ≥ 0, v − x − 1 ≥ 0} = {v : v ≥ x + 1, x ≥ 1} and


ũ(x − 1)ũ(v − x − 1) = 1 for {x − 1 ≥ 0, v − x − 1 ≥ 0} and 0 otherwise. ♦

4.3 Expected Values and Joint Moments

For random vectors, we will describe here the basic properties (Balakrishnan 1992;
Kendall and Stuart 1979; Samorodnitsky and Taqqu 1994) of expected values. New
notions will also be defined and explored.

4.3.1 Expected Values

The expected values for random vectors can be described as, for example,

E {g(X)} = g(x)d FX (x) (4.3.1)

by extending the notion of the expected values discussed in Chap. 3 into multiple
dimensions. Because the expectation is a linear operator, we have
 

n 
n
E ai gi (X i ) = ai E {gi (X i )} (4.3.2)
i=1 i=1

for an n-dimensional random vector X when {gi }i=1


n
are all measurable functions. In
addition,
4.3 Expected Values and Joint Moments 287
 

n 
n
E gi (X i ) = E {gi (X i )} (4.3.3)
i=1 i=1

when X is an independent random vector.

Example 4.3.1 Assume that we repeatedly roll a fair die until the number of even-
numbered outcomes is 10. Let N denote the number of rolls and X i denote the
number of outcome i when the repetition ends. Obtain the pmf of N , expected value
of N , expected value of X 1 , and expected value of X 2 .

Solution First, the pmf of N can be obtained as P(N = k) = P ( Ak ∩ B) =


  9  1 k−1−9
P ( Ak | B) P(B) = P (Ak ) P(B) = 21 k−1 C9 21 2
, i.e.,
 k
1
P(N = k) = k−1 C9 (4.3.4)
2

for k = 10, 11, . . ., where Ak = {an even number at the k-th rolling} and B =


{9 times of even numbers until (k − 1)-st rolling}. Using r +x−1 Cx (1 − α) =
x
x=0
(k−1)!
α −r shown in (2.5.16) and noting that k k−1 C9 = k (k−10)!9! = 10 (k−10)!10!
k!
= 10 k C10 ,

∞ 1 k 
∞ 1 k 
∞ 1 j
we get E{N } = k−1 C9 2 k = 10 k C10 2 = 21010 j+10 C10 2 =
k=10 k=10 j=0
1 −11
10
= 20. This result can also be obtained from the formula (3.E.27) of
210 2 
the mean of the NB distribution with the pmf (2.5.17) by using (r, p) = 10, 21 .
Subsequently, until the end, even numbers will occur 10 times, among which 2, 4,

6
and 6 will occur equally likely. Thus, E {X 2 } = 10
3
. Next, from N = X i , we get
i=1

6
E{N } = E {X i }. Here, because E {X 2 } = E {X 4 } = E {X 6 } = 10
3
and E {X 1 } =
i=1

E {X 3 } = E {X 5 }, the expected value11 of X 1 is E {X 1 } = 20 − 3 × 10
3
1
3 = 10
3 . ♦

4.3.2 Joint Moments

We now generalize the concept of moments discussed in Chap. 3 for random vectors.
The moments for bi-variate random vectors will first be considered and then those
for higher dimensions will be discussed.

11The expected values of X 1 and X 2 can of course be obtained with the pmf’s of X 1 and X 2
obtained already in Example 4.1.11.
288 4 Random Vectors

4.3.2.1 Bi-variate Random Vectors

Definition 4.3.1 (joint moment; joint central moment) The expected value
 
m jk = E X j Y k (4.3.5)

is termed the ( j, k)-th joint moment or product moment of X and Y , and


 
μ jk = E (X − m X ) j (Y − m Y )k (4.3.6)

is termed the ( j, k)-th joint central moment or product central moment of X and Y ,
for j, k = 0, 1, . . ., where m X and m Y are the means of X and Y , respectively.
It is easy to see that m 00= μ00 = 1, m 10 = m X = E{X }, m 01 = m Y = E{Y },
m 20 = E X 2 , m 02 = E Y 2 , μ10 = μ01 = 0, μ20 = σ X2 is the variance of X , and
μ02 = σY2 is the variance of Y .
 
Example 4.3.2 The expected value E X 1 X 23 is the (1, 3)-rd joint moment of X =
(X 1 , X 2 ). ♦
Definition 4.3.2 (correlation; covariance) The (1, 1)-st joint moment m 11 and the
(1, 1)-st joint central moment μ11 are termed the correlation and covariance, respec-
tively, of the two random variables. The ratio of the covariance to the product of the
standard deviations of two random variables is termed the correlation coefficient.
The correlation m 11 = E{X Y } is often denoted12 by R X Y , and the covariance
μ11 = E {(X − m X ) (Y − m Y )} = E{X Y } − m X m Y by K X Y , Cov(X, Y ), or C X Y .
Specifically, we have

K XY = RXY − m X mY (4.3.7)

for the covariance, and

K XY
ρX Y = % (4.3.8)
σ X2 σY2

for the correlation coefficient.


Definition 4.3.3 (orthogonal; uncorrelated) When the correlation is 0 or, equiv-
alently, when the mean of the product is 0, the two random variables are called
orthogonal. When the mean of the product is the same as the product of the means
or, equivalently, when the covariance or correlation coefficient is 0, the two random
variables are called uncorrelated.

12When there is more than one subscript, we need commas in some cases: for example, the joint
pdf f X,Y of (X, Y ) should be differentiated from the pdf f X Y of the product X Y . In other cases, we
do not need to use commas: for instance, R X Y , μ jk , K X Y , . . . denote relations among two or more
random variables and thus is expressed without any comma.
4.3 Expected Values and Joint Moments 289

In other words, when R X Y = E{X Y } = 0, X and Y are orthogonal. When ρ X Y =


0, K X Y = Cov(X, Y ) = 0, or E{X Y } = E{X }E{Y }, X and Y are uncorrelated.

Theorem 4.3.1 If two random variables are independent of each other, then they
are uncorrelated, but the converse is not necessarily true.

In other words, there exist some uncorrelated random variables that are not inde-
pendent of each other. In addition, when two random variables are independent and
at least one of them has mean 0, the two random variables are orthogonal.

Theorem 4.3.2 The absolute value of a correlation coefficient is no larger than 1.

   
Proof From the Cauchy-Schwarz inequality E2 {X Y } ≤ E X 2 E Y 2 shown in
(6.A.26), we get
   
E2 {(X − m X ) (Y − m Y )} ≤ E (X − m X )2 E (Y − m Y )2 , (4.3.9)

which implies K X2 Y ≤ σ X2 σY2 . Thus, ρ X2 Y ≤ 1 and |ρ X Y | ≤ 1. ♠

Example 4.3.3 When the two random variables X and Y are related by Y − m Y =
c (X − m X ) or Y = cX + d, we have |ρ X Y | = 1. ♦

4.3.2.2 Multi-dimensional Random Vectors

Let E {X} = m X = (m 1 , m 2 , . . . , m n )T be the mean vector of X = (X 1 , X 2 ,


. . . , X n )T . In subsequent discussions, especially when we discuss joint moments
of random vectors, we will often assume the random vectors are complex. The dis-
cussion on complex random vectors is almost the same as that on real random vectors
on which we have so far focused.

Definition 4.3.4 (correlation matrix; covariance matrix) The matrix


 
RX = E X X H (4.3.10)

is termed the correlation matrix and the matrix

K X = R X − m X m XH (4.3.11)

is termed the covariance matrix or variance-covariance matrix of X, where the super-


script H denotes the complex conjugate transpose, also called the Hermitian trans-
pose or Hermitian conjugate.
290 4 Random Vectors
 "
The correlation matrix R X = Ri j is of  size n × n: the (i, j)-th element of R X

is the correlation Ri j = R X i X j = E X i X j between X i and X j when i = j and the
 
 " absolute moment Rii = E |X i | when i = j. The covariance matrix K X =
2
second
K i j is also an n × n matrix: the (i, j)-th element of K X is the covariance K i j =
  ∗
K X i X j = E (X i − m i ) X j − m j of X i and X j when i = j and the variance
K ii = Var (X i ) of X i when i = j.
Example 4.3.4 For a random
 vector
" X = (X 1 , X 2 , . . . , X n )T and an n × n linear
transformation matrix L = L i j , consider the random vector

Y = L X, (4.3.12)
 T
where Y = (Y1 , Y2 , . . . , Yn )T . Then, letting m X = m X 1 , m X 2 , . . . , m X n be the
 T
mean vector of X, the mean vector mY = E {Y } = m Y1 , m Y2 , . . . , m Yn =
E {L X} = LE {X} of Y can be obtained as

mY = L m X . (4.3.13)

Similarly, denoting by R X and K X the correlation and


 covariance
 matrices,

respectively, of X, the correlation matrix R Y = E Y Y H = E L X (L X) H =
 
L E X X H L H of Y can be expressed as

RY = L R X L H , (4.3.14)
 
and the covariance matrix K Y = E (Y − mY ) (Y − mY ) H = R Y − mY mYH =

L R X − m X m XH L H of Y can be expressed as

KY = L K X LH. (4.3.15)

More generally, when Y = L X + b, we have mY = Lm X + b, R Y = L R X L H +


Lm X b H + b (Lm X ) H + bb H , and K Y = L K X L H . In essence, the results (4.3.13)–
(4.3.15), shown in Fig. 4.14 as a visual representation, dictate that the mean vector,
correlation matrix, and covariance matrix of Y = L X can be obtained without first
having to obtain the cdf, pdf, or pmf of Y , an observation similar to that on (3.3.9).

 
2 −1
Example 4.3.5 Assume m X = (1, 2)T and K X = for X = (X 1 , X 2 )T .
−1 1
 
1 1
When L = , obtain mY and K Y for Y = L X.
−1 1
Solution We easily get the mean mY = L m X = (3, 1)T of Y =
 Y = L X is K Y = L K X L =
H
L X. Next,
  the covariance
   matrix of
1 1 2 −1 1 −1 1 −1
= . ♦
−1 1 −1 1 1 1 −1 5
4.3 Expected Values and Joint Moments 291

X L Y = LX

{mX , RX , K X } mY = L mX , RY = L RX LH ,

K Y = L K X LH

Fig. 4.14 The mean vector, correlation matrix, and covariance matrix of linear transformation

Theorem 4.3.3 The correlation and covariance matrices are Hermitian.13


     ∗
Proof From R X i X j = E X i X ∗j = E X j X i∗ = R ∗X j X i for the correlation of X i
and X j , the correlation matrix is Hermitian. Similarly, it is easy to see that the
covariance matrix is also Hermitian by letting Yi = X i − m i . ♠

Theorem 4.3.4 The correlation and covariance matrices of any random vector are
positive semi-definite.

Proof Let a = (a1 , a2 ,. . . , an) and X = (X 1 , X 2 , . . . , X n )T . Then, the corre-



lation matrix R X = E X X H is positive semi-definite because E |aX|2 =
   
E aX X H a H = aE X X H a H ≥ 0. Letting Yi = X i − m i , we can similarly show
that the covariance matrix is positive semi-definite. ♠

Definition 4.3.5 (uncorrelated random


  vector) A random
  vector X is called an
uncorrelated random vector if E X i X j = E {X i } E X ∗j for all i and j such that

i = j.

 For "an uncorrelated random vector X, we have the correlation matrix R X =


R X i X j with
  
E |X i |2 ,  i = j,
R Xi X j = (4.3.16)
E {X i } E X ∗j , i = j
 "
and the covariance matrix K X = K X i X j with K X i X j = σ X2 i δi j , where

1, i = j,
δi j = (4.3.17)
0, i = j

and

1, k = 0,
δk = (4.3.18)
0, k = 0

13 A matrix A such that A H = A is called Hermitian.


292 4 Random Vectors

are called the Kronecker delta function. In some cases, an uncorrelated random vector
is referred to as a linearly independent random vector. In addition, a random vector
X = (X 1 , X 2 , . . . , X n ) is called a linearly dependent random vector if there exists a
vector a = (a1 , a2 , . . . , an ) = 0 such that a1 X 1 + a2 X 2 + · · · + an X n = 0.
Definition
  4.3.6 (random
 vectors uncorrelated with each other) When we have
E X i Y j∗ = E {X i } E Y j∗ for all i and j, the random vectors X and Y are called
uncorrelated with each other.
Note that even when X and Y are uncorrelated with each other, each of X and Y
may or may not be an uncorrelated random vector, and even when X and Y are both
uncorrelated random vectors, X and Y may be correlated.
Theorem 4.3.5 (McDonough and Whalen 1995) If the covariance matrix of a ran-
dom vector is positive definite, then the random vector can be transformed into an
uncorrelated random vector via a linear transformation.

Proof The theorem can be proved by noting that, when an n × n matrix A is a


normal14 matrix, we can take n orthogonal unit vectors as the eigenvectors of A and
that there exists a unitary15 matrix P such that P −1 A P = P H A P is diagonal.
Let {λi }i=1
n
be the eigenvalues of the positive definite covariance matrix K X of X.
Because a covariance matrix is Hermitian, {λi }i=1 n
are all real. In addition, because
K X is positive definite, {λi }i=1 are all larger than 0. Now, choose the eigenvectors
n

corresponding to the eigenvalues {λi }i=1n


as
 n
{ai }i=1
n
= (ai1 , ai2 , . . . , ain )T i=1
, (4.3.19)

respectively, so that the eigenvectors are orthonormal, that is,

aiH a j = δi j . (4.3.20)

Now, consider the unitary matrix

A = (a1 , a2 , . . . , an ) H
 "
= ai∗j (4.3.21)

composed of the eigenvectors (4.3.19) of K X . Because K X ai = λi ai , and there-


fore K X (a1 , a2 , . . . , an ) = (λ1 a1 , λ2 a2 , . . . , λn an ), the covariance matrix K Y =
AK X A H of Y = (Y1 , Y2 , . . . , Yn )T = AX can be obtained as

14 A matrix A such that A A H = A H A is normal.


15 A matrix A is unitary if A H = A−1 or if A H A = A A H = I. In the real space, a unitary matrix
is referred to as an orthogonal matrix. A Hermitian matrix is always a normal matrix anda unitary

1 −1
matrix is always a normal matrix, but the converses are not necessarily true: for example,
1 1
is normal, but is neither Hermitian nor unitary.
4.3 Expected Values and Joint Moments 293

X λ̃A Z = λ̃AX

K X , |K X | > 0, {λi }n
i=1
λ̃ = diag √1 , √1 , · · · , √1 KZ = I
λ1 λ2 λn
K X ai = λ i ai
A = (a1 , a2 , · · · , an )H
aH
i aj = δij

Fig. 4.15 Decorrelating into uncorrelated unit-variance random vectors

K Y = (a1 , a2 , . . . , an ) H (λ1 a1 , λ2 a2 , . . . , λn an )
 "
= λi aiH a j
= diag (λ1 , λ2 , . . . , λn ) (4.3.22)

from (4.3.15) using (4.3.20). In essence, Y = AX is an uncorrelated random vector.


Let us proceed one step further from Theorem 4.3.5. Recollecting that the eigen-
values {λi }i=1
n
of the covariance matrix K X are all larger than 0, let
 
1 1 1
λ̃ = diag √ , √ , . . . , √ (4.3.23)
λ1 λ2 λn

and consider the linear transformation

Z = λ̃Y (4.3.24)
H
of Y . Then, the covariance matrix K Z = λ̃K Y λ̃ of Z is

K Z = I. (4.3.25)

In other words, Z = λ̃Y = λ̃ AX is a vector of uncorrelated unit-variance random


variables. Figure 4.15 summarizes the procedure.

 4.3.6 Assume that the covariance matrix of X = (X 1 , X 2 ) is K X =


T
Example

13 12
. Find a linear transformation that decorrelates X into an uncorrelated
12 13
unit-variance random vector.

Solution
 From the characteristic equation
 |λI − K X | = 0, we get the two
pairs λ1 = 25, a1 = √2 (1 1)
1 T
and λ2 = 1, a2 = √12 (1 − 1)T of eigen-
value
& and unit
' eigenvector of K X . With the linear transformation C = λ̃ A =
√1
  
0 1 1 1 1
√1 25 = 50√1
constructed from the two pairs, the covari-
2 0 √11 1 −1 5 −5
294 4 Random Vectors
  
1 1 25 5
ance matrix K W = C K X C H of W = C X is K W = 50 1
=
5 −5 25 −5
 
10
. In other words, C is a linear transformation that decorrelates X into an
01
 
1 1
uncorrelated unit-variance random vector. Note that A = 2 √1
is a unitary
1 −1
matrix.  
−2 3
Meanwhile, for B = 5 1
, the covariance matrix of Y = B X is K Y =
3 −2
    
−2 3 10 15 10
B K X B H = 25 1
= . In other words, like C, the trans-
3 −2 15 10 01
formation B also decorrelates X into an uncorrelated unit-variance random vec-
tor.In addition, from C K X C H = I and B K X B H = I, we get C H C = B H B =
13 −12
1
= K −1
X . ♦
25 −12 13

 U = (U1 , U2 ) has mean vector (10 0) and covari-


T T
Example 4.3.7
 When
41
ance matrix , consider the linear transformation V = (V1 , V2 )T = LU =
11
    
−2 5 U1 −2U1 + 5U2
= of U. Then, the mean vector of V is E {V } =
1 1 U U1 + U2
2    
−2 5 10 −20
LE {U} = = . In addition, the covariance matrix of V is
1 1 0 10
 
21 0
K V = L KU LH = . ♦
0 7
Example 4.3.6 implies that the decorrelating linear transformation is generally
not unique.

4.3.3 Joint Characteristic Function and Joint Moment


Generating Function

By extending the notion of the cf and mgf discussed in Sect. 3.3.4, we introduce and
discuss the joint cf and joint mgf of multi-dimensional random vectors.
Definition 4.3.7 (joint cf) The function
  
ϕ X (ω) = E exp jω T X (4.3.26)

is the joint cf of X = (X 1 , X 2 , . . . , X n )T , where ω = (ω1 , ω2 , . . . , ωn )T .


The joint cf ϕ X (ω) of X can be expressed as
∞ ∞ ∞ 
ϕ X (ω) = ··· f X (x) exp jω T x d x (4.3.27)
−∞ −∞ −∞
4.3 Expected Values and Joint Moments 295

when X is a continuous random vector, where x = (x1 , x2 , . . . , xn )T and d x =


d x1 d x2 · · · d xn . Thus, the joint cf ϕ X (ω) is the complex conjugate of the multi-
dimensional Fourier transform F { f X (x)}  ∞ pdf f X (x). Clearly,
 ∞ of∞the joint we can
obtain the joint pdf as f X (x) = (2π) 1
n −∞ −∞ · · · ϕ
−∞ X (ω) exp − jω T
x dω by
inverse transforming the joint cf.

Definition 4.3.8 (joint mgf) The function


  
M X (t) = E exp t T X (4.3.28)

is the joint mgf of X, where t = (t1 , t2 , . . . , tn )T .

The joint mgf M X (t) is the multi-dimensional Laplace transform L { f X (x)} of


the joint pdf f X (x) with t replaced by −t. The joint pdf can be obtained from the
inverse Laplace transform of the joint mgf.
The marginal cf and marginal mgf can be obtained from the joint cf and joint mgf,
respectively. For example, the marginal cf ϕ X i (ωi ) of X i can be obtained as

ϕ X i (ωi ) = ϕ X (ω)ω j =0 for all j=i
(4.3.29)

from the joint cf ϕ X (ω), and the marginal mgf M X i (ti ) of X i as



M X i (ti ) = M X (t)t j =0 for all j=i
(4.3.30)

from the joint mgf M X (t).


When X is an independent random vector, it is easy to see that the joint cf ϕ X (ω)
is the product


n
ϕ X (ω) = ϕ X i (ωi ) (4.3.31)
i=1

of marginal cf’s from Theorem 4.1.3. Because cf’s and distributions are related by
one-to-one correspondences as we discussed in Theorem 3.3.2, a random vector
whose joint cf is the product of the marginal cf’s is an independent random vector.
n
Example 4.3.8 For an independent random vector X, let Y = X i . Then, the cf
 jωY   jωX jωX  i=1
ϕY (ω) = E e = E e 1 e 2 · · · e jωX n of Y can be expressed as


n
ϕY (ω) = ϕ X i (ω). (4.3.32)
i=1

By inverse transforming the cf (4.3.32), we can get the pdf f Y (y) of Y , which is the
convolution
296 4 Random Vectors

f X 1 +X 2 +···+X n = f X 1 ∗ f X 2 ∗ · · · ∗ f X n (4.3.33)

of the marginal pdf’s. The result (4.2.15) is a special case of (4.3.33) with
n = 2. ♦
n 
n 
Example 4.3.9 Show that Y = Xi ∼ P λi when {X i ∼ P (λi )}i=1
n
are
i=1 i=1
independent of each other.
  
Solution The cf for the distribution P (λi ) is ϕ X i (ω) = exp λi e jω − 1 . Thus,
n    n
the cf ϕY (ω) = exp λi e jω − 1 of Y = X i can be expressed as
i=1 i=1

& ' 

n
 jω
ϕY (ω) = exp λi e −1 (4.3.34)
i=1

from (4.3.32), confirming the desired result. ♦


In Sect. 6.2.1, we will discuss again the sum of a number of independent random
variables. As we observed in (4.3.32), when X and Y are independent of each other,
the cf of X + Y is the product of the cf’s of X and Y . On the other hand, the converse
does not hold true: when the cf of X + Y is the product of the cf’s of X and Y , X and
Y may or may not be independent of each other. Specifically, assume X 1 ∼ F1 and
X 2 ∼ F2 are independent of each other, where Fi is the cdf of X i for i = 1, 2. Then,
X 1 + X 2 ∼ F1 ∗ F2 and, if X 1 and X 2 are absolutely continuous random variables
with pdf f 1 and f 2 , respectively, X 1 + X 2 ∼ f 1 ∗ f 2 . Yet, even when X 1 and X 2 are
not independent of each other, in some cases we have X 1 + X 2 ∼ f 1 ∗ f 2 (Romano
and Siegel 1986; Stoyanov 2013; Wies and Hall 1993). such cases include the non-
independent Cauchy random variables and that shown in the example below.
Example 4.3.10 Assume the joint pmf p X,Y (x, y) = P(X = x, Y = y)

p X,Y (1, 1) = 19 , p X,Y (1, 2) = 18


1
, p X,Y (1, 3) = 16 ,
p X,Y (2, 1) = 6 , p X,Y (2, 2) = 9 , p X,Y (2, 3) = 18
1 1 1
, (4.3.35)
p X,Y (3, 1) = 18 , p X,Y (3, 2) = 6 , p X,Y (3, 3) = 9
1 1 1

of a discrete random vector (X, Y ). Then, the pmf p X (x) = P(X = x) of X is


p X (1) = p X (2) = p X (3) = 13 , and the pmf pY (y) = P(Y = y) of Y is pY (1) =
pY (2) = pY (3) = 13 , implying that X and Y are not independent of each other.

However, the mgf’s of X and Y are both M X (t) = MY (t) = 13 et + e2t + e3t and

the mgf of X + Y is M X +Y (t) = 19 e2t + 2e3t + 3e4t + 2e5t + e6t = M X (t)MY (t).

We have observed that when X and Y are independent of each other and g and h
are continuous functions, g(X ) and h(Y ) are independent of each other in Theorem
4.1.3. We now discuss whether g(X ) and h(Y ) are uncorrelated or not when X and
Y are uncorrelated.
4.3 Expected Values and Joint Moments 297

Theorem 4.3.6 When X and Y are independent, g(X ) and h(Y ) are uncorrelated.
However, when X and Y are uncorrelated but not independent, g(X ) and h(Y ) are
not necessarily uncorrelated.

Proof
 ∞  When

X and Y are independent of  ∞ E {g(X )h(Y )} =
 ∞each other, we have
−∞ −∞ g(x)h(y) f X,Y (x, y)d xd y = −∞ g(x) f X (x)d x −∞ h(y) f Y (y)dy =
E{g(X )}E{h(Y )} and thus g(X ) and h(Y ) are uncorrelated. Next, when X and
Y are uncorrelated but are not independent  of eachother,assume  g(X) and
 that
h(Y ) are uncorrelated. Then, we have E e j (ω1 X +ω2 Y ) = E e jω1 X E e jω2 Y from
E{g(X )h(Y )} = E{g(X )}E{h(Y )} with g(x) = e jω1 x and h(y) = e jω2 y . This result
implies
  1 ,ω2 ) ofX and Y can be expressed as ϕ X,Y (ω1 , ω2 ) =
that thejoint cf ϕ X,Y (ω
E e j (ω1 X +ω2 Y ) = E e jω1 X E e jω2 Y = ϕ X (ω1 ) ϕY (ω2 ) in terms of the marginal
cf’s of X and Y , a contradiction that X and Y are not independent of each other.
In short, we have E{X Y } = E{X }E{Y }  E{g(X )h(Y )} = E{g(X )}E{h(Y )}, i.e.,
when X and Y are uncorrelated but are not independent of each other, g(X ) and
h(Y ) are not necessarily uncorrelated. ♠

The joint moments of random vectors can be easily obtained by using the joint cf
or joint mgf as shown in the following theorem:
   
∞ ∞
Theorem 4.3.7 The joint moment m k1 k2 ···kn = E X 1k1 X 2k2 · · · X nkn = −∞ −∞ · · ·
 ∞ k1 k2
−∞ x 1 x 2 · · · x n f X (x)d x can be obtained as
kn


∂ K ϕ X (ω) 
−K 
m k1 k2 ···kn = j  (4.3.36)
∂ω1k1 ∂ω2k2 · · · ∂ωnkn ω=0


n
from the joint cf ϕ X (ω), where K = ki .
i=1

From the joint mgf M X (t), we can obtain the joint moment m k1 k2 ···kn also as

∂ K M X (t) 

m k1 k2 ···kn = k1 k2 kn 
. (4.3.37)
∂t1 ∂t2 · · · ∂tn t=0

As a special case of (4.3.37) for the two-dimensional random vector (X, Y ), we have

∂ k+r M X,Y (t1 , t2 ) 
m kr =  , (4.3.38)
∂t1k ∂t2r (t1 ,t2 )=(0,0)

where M X,Y (t1 , t2 ) = E {exp (t1 X + t2 Y )} is the joint mgf of (X, Y ).


298 4 Random Vectors

Example 4.3.11 (Romano and Siegel 1986) When the two functions G and H are
equal, we have F ∗ G = F ∗ H . The converse does not always hold true, i.e., F ∗
G = F ∗ H does not necessarily imply G = H . Let us consider an example. Assume
⎧1
⎨ 2, x = 0,
P(X = x) = 2
π 2 (2n−1)2
, x = ±(2n − 1)π, n = 1, 2, . . . , (4.3.39)

0, otherwise



for a random variable X . Then, the cf ϕ X (t) = 1
2
+ 4
π 2 (2n−1)2
cos{(2n − 1)π t} of
n=1
X is a train of triangular pulses with period 2 and

ϕ X (t) = 1 − |t| (4.3.40)

for |t| ≤ 1. Meanwhile, when the distribution is


 4
, x = ± (2n−1)π , n = 1, 2, . . . ,
P(Y = x) = π 2 (2n−1)2 2 (4.3.41)
0, otherwise

for Y , the cf is also a train of triangular pulses with period 4 and

ϕY (t) = 1 − |t| (4.3.42)

for |t| ≤ 2. It is easy to see that ϕ X (t) = ϕY (t) for |t| ≤ 1 and that |ϕ X (t)| = |ϕY (t)|
for all t from (4.3.40) and (4.3.42). Now, for a random variable Z with the pdf
 1−cos x
, x = 0,
f Z (x) = π x2 (4.3.43)
1

, x = 0,

we have the cf ϕ Z (t) = (1 − |t|)u(1 − |t|). Then, we have ϕ Z (t)ϕ X (t) = ϕ Z (t)ϕY (t)
and FZ (x) ∗ FX (x) = FZ (x) ∗ FY (x), but FX (x) = FY (x), where FX , FY , and FZ
denote the cdf’s of X , Y , and Z , respectively. ♦

4.4 Conditional Distributions

In this section, we discuss conditional probability functions (Ross 2009) and condi-
tional expected values mainly for bi-variate random vectors.
4.4 Conditional Distributions 299

4.4.1 Conditional Probability Functions

We first extend the discussion on the conditional distribution explored in Sect. 3.4.
When the event A is assumed, the conditional joint cdf16 FZ ,W |A (z, w) = P(Z ≤
z, W ≤ w| A) of Z and W is
P(Z ≤ z, W ≤ w, A)
FZ ,W |A (z, w) = . (4.4.1)
P(A)

The conditional joint pdf17 can be obtained as


∂2
f Z ,W |A (z, w) = FZ ,W |A (z, w) (4.4.2)
∂z∂w

by differentiating the conditional joint cdf FZ ,W |A (z, w) with respect to z and w.


Example 4.4.1 Obtain the conditional joint cdf FX,Y |A (x, y) and the conditional
joint pdf f X,Y |A (x, y) under the condition A = {X ≤ x}.
P(X ≤x,Y ≤y)
Solution Recollecting that P(X ≤ x, Y ≤ y| X ≤ x) = P(X ≤x)
, we get the con-
P(X ≤x,Y ≤y)
ditional joint cdf FX,Y |A (x, y) = FX,Y |X ≤x (x, y) = P(X ≤x)
as
FX,Y (x, y)
FX,Y |X ≤x (x, y) = . (4.4.3)
FX (x)

Differentiating the conditional joint cdf (4.4.3) with respect to x and y, we get the
conditional joint pdf f X,Y |A (x, y) as
 
∂ 1 ∂
f X,Y |X ≤x (x, y) = FX,Y (x, y)
∂ x FX (x) ∂ y
f X,Y (x, y) f X (x) ∂
= − 2 FX,Y (x, y). (4.4.4)
FX (x) FX (x) ∂ y

P(X ≤x,Y ≤y)


By writing FY |X ≤x (y) as FY |X ≤x (y) = P(X ≤x)
, i.e.,

FX,Y (x, y)
FY |X ≤x (y) = , (4.4.5)
FX (x)

we get

FY |X ≤x (y) = FX,Y |X ≤x (x, y) (4.4.6)

from (4.4.3) and (4.4.5). ♦

16 As in other cases, the conditional joint cdf is also referred to as the conditional cdf if it does not
cause any ambiguity.
17 The conditional joint pdf is also referred to as the conditional pdf if it does not cause any ambiguity.
300 4 Random Vectors

We now discuss the conditional distribution when the condition is expressed in


terms of random variables.

Definition 4.4.1 (conditional cdf; conditional pdf; conditional pmf) For a random
vector (X, Y ), P(Y ≤ y|X = x) is called the conditional joint cdf, or simply the
conditional cdf, of Y given X = x, and is written as FX,Y |X =x (x, y), FY |X =x (y),
or FY |X (y|x). For a continuous random vector (X, Y ), the derivative ∂∂y FY |X (y|x),
denoted by f Y |X (y|x), is called the conditional pdf of Y given X = x. For a discrete
random vector (X, Y ), P(Y = y|X = x) is called the conditional joint pmf or the
conditional pmf of Y given X = x and is written as p X,Y |X =x (x, y), pY |X =x (y), or
pY |X (y|x).

The relationships among the conditional pdf f Y |X (y|x), joint pdf f X,Y (x, y), and
marginal pdf f X (x) and those among the conditional pmf pY |X (y|x), joint pmf
p X,Y (x, y), and marginal pmf p X (x) are described in the following theorem:

Theorem 4.4.1 The conditional pmf pY |X (y|x) can be expressed as

p X,Y (x, y)
pY |X (y|x) = (4.4.7)
p X (x)

when (X, Y ) is a discrete random vector. Similarly, the conditional pdf f Y |X (y|x)
can be expressed as

f X,Y (x, y)
f Y |X (y|x) = (4.4.8)
f X (x)

when (X, Y ) is a continuous random vector.


P(X =x,Y =y)
Proof For a discrete random vector, we easily get pY |X (y|x) = P(X =x)
=
p X,Y (x,y)
p X (x)
. For a continuous random vector, we have FY |X (y|x) = P(Y ≤ y|X = x) =
P(x−d x<X ≤x,Y ≤y)
lim P(x−d x<X ≤x) as
d x→0

FX,Y (x,y)−FX,Y (x−d x,y)


FY |X (y|x) = lim dx
FX (x)−FX (x−d x)
d x→0
dx
1 ∂
= FX,Y (x, y) (4.4.9)
f X (x) ∂ x

∂ ∂2
and consequently, f Y |X (y|x) = ∂y
FY |X (y|x) = 1
f X (x) ∂ x∂ y
FX,Y (x, y), which is the
same as (4.4.8). ♠
4.4 Conditional Distributions 301


We can similarly obtain the conditional pdf f X |Y (x|y) = ∂x
FX |Y (x|y) as

f X,Y (x, y)
f X |Y (x|y) = , (4.4.10)
f Y (y)

which can also be obtained directly from (4.4.8) by replacing X , Y , x, and y with Y ,
X , y, and x, respectively. Note the similarity among (4.4.7), (4.4.8), and (2.4.1).

Example 4.4.2 (Ross 1976) Obtain the conditional pdf f X |Y (x|y) when the joint
pdf of (X, Y ) is f X,Y (x, y) = 6x y(2 − x − y)u(x)u(1 − x)u(y)u(1 − y).
∞
Solution Employing (4.4.10) and noting that f Y (y) = −∞ f X,Y (x, y)d x =
1
0 6x y(2 − x − y)d x, we have f X |Y (x|y) = 1
6x y(2−x−y)
 , i.e.,
0 6x y(2−x−y)d x

6x(2 − x − y)
f X |Y (x|y) = (4.4.11)
4 − 3y

for 0 < x < 1 and 0 < y < 1. ♦

Example 4.4.3 (Ross 1976) Obtain the conditional pdf f X |Y (x|y) when the joint
pdf of (X, Y ) is f X,Y (x, y) = 4y(x − y)e−(x+y) u(y)u(x − y).
−(x+y)
Solution We easily get f X |Y (x|y) =  ∞4y(x−y)e
4y(x−y)e −(x+y) d x , i.e.,
y

f X |Y (x|y) = (x − y)e−(x−y) (4.4.12)

for 0 ≤ y ≤ x < ∞ from (4.4.10). ♦

It is noteworthy that, unlike (4.4.7) or (4.4.8), we have

FX,Y (x, y)
FY |X (y|x) = (4.4.13)
FX (x)
(x,y) ≤x,Y ≤y)
from FY |X (y|x) = P(Y ≤ y|X = x) and FX,Y = P(XP(X = P(Y ≤ y|X ≤
 yFX (x) ≤x)
x). Employing (4.4.8) with FY |X (y|t) = −∞ f Y |X (s|t)ds, we get FX,Y (x, y) =
x x y x y
−∞ FY |X (y|t) f X (t)dt = −∞ −∞ f Y |X (s|t)ds f X (t)dt = −∞ −∞ f X,Y (t, s)dsdt
from (4.4.9). This result is the same as (4.1.10) or (4.1.21) in that the cdf can be
obtained by integrating the pdf.

Theorem 4.4.2 The pmf p X of X can be expressed as




p X (x) = p X |Y (x|y) pY (y) (4.4.14)
y=−∞

for a discrete random vector (X, Y ), and the pdf f X of X can be expressed as
302 4 Random Vectors


f X (x) = f X |Y (x|y) f Y (y)dy (4.4.15)
−∞

for a continuous random vector (X, Y ).


∞
Theorem 4.4.2 can be easily proved by noting that p X (x) = p X,Y (x, y),
∞ y=−∞
p X,Y (x, y) = p X |Y (x|y) pY (y), f X (x) = −∞ f X,Y (x, y)dy, and f X,Y (x, y) =
f X |Y (x|y) f Y (y). We can obtain the following theorem based on (4.1.23), (4.4.8),
(4.4.10), and (4.4.15):
f X,Y (x,y)
Theorem 4.4.3 We can rewrite f Y |X (y|x) = f X (x)
as

f X |Y (x|y) f Y (y)
f Y |X (y|x) =  ∞ (4.4.16)
−∞ f X |Y (x|y) f Y (y)dy

for any x such that f X (x) > 0 by noting that f X,Y (x, y) = f X |Y (x|y) f Y (y) and

f X (x) = −∞ f X |Y (x|y) f Y (y)dy.

Similarly to (4.4.16), we have

p X |Y (x|y) pY (y)
pY |X (y|x) = (4.4.17)


p X |Y (x|y) pY (y)
y=−∞

for any x such that p X (x) > 0 when (X, Y ) is a discrete random vector.
When X and Y are independent of each other, we have

FX |Y (x|y) = FX (x) (4.4.18)

for every point y such that FY (y) > 0,

f X |Y (x|y) = f X (x) (4.4.19)

for every point y such that f Y (y) > 0, FY |X (y|x) = FY (y) for every point x such that
FX (x) > 0, and f Y |X (y|x) = f Y (y) for every point x such that f X (x) > 0 because
FX,Y (x, y) = FX (x)FY (y) and f X,Y (x, y) = f X (x) f Y (y).

Example 4.4.4 Assume the pmf


1
, x = 3; 1
, x = 4;
p X (x) = 6 2 (4.4.20)
1
3
, x =5

of X . Consider
4.4 Conditional Distributions 303

0, if X = 4,
Y = (4.4.21)
1, if X = 3 or 5

and Z = X − Y . Obtain the conditional pmf’s p X |Y , pY |X , p X |Z , p Z |X , pY |Z , and


p Z |Y , and the joint pmf’s p X,Y , pY,Z , and p Z ,X .

Solution We easily get pY (y) = P(X = 4) for y = 0 and pY (y) = P(X = 3 or 5)


for y = 1, i.e.,
1
, y = 0,
pY (y) = 2 (4.4.22)
1
2
, y = 1.

Next, because Y = 1 and Z = X − Y = 2 when X = 3, Y = 0 and Z = X − Y = 4


when X = 4, and Y = 1 and Z = X − Y = 4 when X = 5, we get p Z (z) = P(X =
3) for z = 2 and p Z (z) = P(X = 4 or 5) for z = 4, i.e.,
1
, z = 2,
p Z (z) = 6 (4.4.23)
5
6
, z = 4.

Next, because Y = 1 when X = 3, Y = 0 when X = 4, and Y = 1 when X = 5 from


(4.4.21), we get

pY |X (0|3) = 0, pY |X (0|4) = 1, pY |X (0|5) = 0,


(4.4.24)
pY |X (1|3) = 1, pY |X (1|4) = 0, pY |X (1|5) = 1.

Noting that p X,Y (x, y) = pY |X (y|x) p X (x) from (4.4.17) and using (4.4.20) and
(4.4.24), we get

p X,Y (3, 0) = 0, p X,Y (4, 0) = 36 , p X,Y (5, 0) = 0,


(4.4.25)
p X,Y (3, 1) = 16 , p X,Y (4, 1) = 0, p X,Y (5, 1) = 13 .

p X,Y (x,y)
Similarly, noting that p X |Y (x|y) = pY (y)
from (4.4.17) and using (4.4.22) and
(4.4.25), we get

p X |Y (3|0) = 0, p X |Y (4|0) = 1, p X |Y (5|0) = 0,


(4.4.26)
p X |Y (3|1) = 13 , p X |Y (4|1) = 0, p X |Y (5|1) = 23 .

Meanwhile, Y = 1 and Z = 2 when X = 3, Y = 0 and Z = 4 when X = 4, and


Y = 1 and Z = 4 when X = 5 from (4.4.21) and the definition of Z . Thus, we have

p Z |X (2|3) = 1, p Z |X (2|4) = 0, p Z |X (2|5) = 0,


(4.4.27)
p Z |X (4|3) = 0, p Z |X (4|4) = 1, p Z |X (4|5) = 1.

Noting that p X,Z (x, z) = p Z |X (z|x) p X (x) from (4.4.17) and using (4.4.20) and
(4.4.27), we get
304 4 Random Vectors

p X,Z (3, 2) = 16 , p X,Z (4, 2) = 0, p X,Z (5, 2) = 0,


(4.4.28)
p X,Z (3, 4) = 0, p X,Z (4, 4) = 36 , p X,Z (5, 4) = 13 .

p X,Z (x,z)
Similarly, noting that p X |Z (x|z) = p Z (z)
from (4.4.17) and using (4.4.23) and
(4.4.28), we get

p X |Z (3|2) = 1, p X |Z (4|2) = 0, p X |Z (5|2) = 0,


(4.4.29)
p X |Z (3|4) = 0, p X |Z (4|4) = 35 , p X |Z (5|4) = 25 .

Finally, we have p Z |Y (2|1) = P(X − Y = 2|Y = 1) = P(X = 3|X = 3 or 5) =


P(X =3)
P(X =3 or 5)
as

1
p Z |Y (2|1) = (4.4.30)
3
P(X =5)
and p Z |Y (4|1) = P(X − Y = 4|Y = 1) = P(X =3 or 5)
as

2
p Z |Y (4|1) = (4.4.31)
3

from p Z |Y (2|0) = P(X − Y = 2|Y = 0) = P(X = 2|X = 4) = 0, p Z |Y (4|0) =


P(X − Y = 4|Y = 0) = P(X = 4|X = 4) = 1, and {X = 3} ∩ {X = 3 or 5} =
{X = 3}. Therefore, noting that pY,Z (y, z) = p Z |Y (z|y) pY (y) from (4.4.17) and
using (4.4.22) and (4.4.31), we get

pY,Z (0, 2) = 0, pY,Z (0, 4) = 21 , pY,Z (1, 2) = 16 , pY,Z (1, 4) = 13 . (4.4.32)

We also get

pY |Z (0|2) = 0, pY |Z (0|4) = 35 , pY |Z (1|2) = 1, pY |Z (1|4) = 2


5
(4.4.33)

pY,Z (y,z)
from pY |Z (y|z) = p Z (z)
by using (4.4.23) and (4.4.32). ♦

Example 4.4.5 When the joint pdf of (X, Y ) is

1
f X,Y (x, y) = u (2 − |x|) u (2 − |y|) , (4.4.34)
16
obtain the conditional joint cdf FX,Y |A and the conditional joint pdf f X,Y |A for A =
{|X | ≤ 1, |Y | ≤ 1}.
1 1
Solution First, we have P(A) = −1 −1 f X,Y (u, v)dudv = 16 1
× 4 = 41 . Next, for

P(X ≤ x, Y ≤ y, A) = f X,Y (u, v)dudv, (4.4.35)


{|u|≤1, |v|≤1, u≤x, v≤y}
4.4 Conditional Distributions 305

Fig. 4.16 The region of v


integration to obtain the
conditional joint cdf 2
FX,Y | A (x, y|A) under the
condition
A = {|X | ≤ 1, |Y | ≤ 1} 1
when f X,Y (x, y) =
(x, y)
16 u (2 − |x|) u (2 − |y|)
1

−2 −1 1 2 u

−1

−2

we get the following results by referring to Fig. 4.16: First, P(X ≤ x, Y ≤ y, A) =


P(A) = 41 when x ≥ 1 and y ≥ 1 and P(X ≤ x, Y ≤ y, A)  = P(∅) = 0 when x ≤
−1 or y ≤ −1. In addition, P(X ≤ x, Y ≤ y, A) = f X,Y (u, v)dudv =
{|u|≤1, −1≤v≤y}
1
16
× 2(y + 1) = 18 (y + 1) because {|u| ≤ 1, |v| ≤ 1, u ≤ x, v ≤ y} = {|u| ≤
1, −1 ≤ v ≤ y} when x ≥ 1 and −1 ≤ y ≤ 1. We similarly get P(X ≤ x, Y ≤
y, A) = 18 (x + 1) when −1 ≤ x ≤ 1 and y ≥ 1, and P(X ≤ x, Y ≤ y, A) = 161
(x +
1)(y + 1) when −1 ≤ x ≤ 1 and −1 ≤ y ≤ 1. Taking these results into account, we
get the conditional joint cdf FX,Y |A (x, y) = P(X ≤x,Y
P(A)
≤y,A)
as


⎪ 1, x ≥ 1, y ≥ 1,


⎨ 21 (y + 1), x ≥ 1, |y| ≤ 1,
FX,Y |A (x, y) = 41 (x + 1)(y + 1), |x| ≤ 1, |y| ≤ 1, (4.4.36)




1
(x + 1), |x| ≤ 1, y ≥ 1,
⎩2
0, x ≤ −1 or y ≤ −1.

We subsequently get the conditional joint pdf

1
f X,Y |A (x, y) = u(1 − |x|)u(1 − |y|) (4.4.37)
4
by differentiating FX,Y |A (x, y). ♦
Let Y = g(X) and g −1 (·) be the inverse of the function g(·). We then have

f Z|Y (z|Y = y) = f Z|X z|X = g −1 ( y) , (4.4.38)

which implies that, when the relationship between X and Y can be expressed via
an invertible function, conditioning on X = g −1 ( y) is equivalent to conditioning on
Y = y.
306 4 Random Vectors

Example 4.4.6 Assume that X and Y are related as Y = g(X) = (X 1 + X 2 , X 1


−X 2 ). Then, we have f Z|Y (z|Y = (3, 1)) = f Z|X z|X = g −1 (3, 1) =
f Z|X (z|X = (2, 1)). ♦

4.4.2 Conditional Expected Values

For one random variable, we have discussed the conditional expected value in
(3.4.30). We now extend the discussion into random vectors with the conditioning
event expressed in terms of random variables.

4.4.2.1 Conditional Expected Values in Random Vectors


∞
Let B = {X = x} in the conditional expected value E{Y |B} = −∞ y f Y |B (y)dy
shown in (3.4.30). Then, the conditional expected value m Y |X = E{Y |X = x} of
Y is

m Y |X = y f Y |X (y|x)dy (4.4.39)
−∞

when X = x.
Example 4.4.7 Obtain the conditional expected value E{X |Y = y} in Example
4.4.2.
1 2 
Solution We easily get E{X |Y = y} = 0 6x (2−x−y)
4−3y
d x = 4−3y
1
2(2 − y)x 3

− 3 x4 
1
2
= 5−4y for 0 < y < 1.
x=0 8−6y

The conditional expected value E{Y |X } is a function of X and is thus a random


variable with its value E{Y |X = x} when X = x.

Theorem 4.4.4 The expected value E{Y } can be obtained as

E {E{Y |X }} = E{Y }
⎧∞


⎪ −∞ E{Y |X = x} f X (x)d x,

⎨ X is a continuous random variable,
= ∞
(4.4.40)

⎪ E{Y |X = x} p X (x),
⎪ x=−∞


X is a discrete random variable

from the conditional expected value E{Y |X }.


4.4 Conditional Distributions 307

Proof Considering only the case of a continuous  ∞random vector, (4.4.40)


can be shown easily as E {E{Y |X }} = −∞ E{Y |X = x} f X (x)d x =
∞ ∞  ∞ ∞ ∞
−∞ −∞ y f Y |X (y|x)dy f X (x)d x = −∞ y −∞ f X,Y (x, y)d xd y = −∞ y f Y (y)
dy = E{Y }. ♠

4.4.2.2 Conditional Expected Values for Functions of Random Vectors

The function g(X, Y ) is a function of two random vectors X and Y while


g (x, Y ) is a function of a vector x and a random vector Y : in other words,
g (x, Y ) and g(X, Y ) are different from each other. In addition, under the
condition
 X = x, the conditional mean of g(X, Y ) is E { g(X, Y )| X = x} =
g (x, y) f Y |X ( y| x) d y, which is the same as the conditional mean
 y
all  
E g(x, Y )| X = x = all y g(x, y) f Y |X ( y| x) d y of g (x, Y ). In other words,

E { g(X, Y )| X = x} = E { g (x, Y )| X = x}
= g(x, y) f Y |X ( y| x) d y. (4.4.41)
all y

Furthermore, for the expected value of the random vector E { g(X, Y )| X}, we
have

E {g(X, Y )} = E {E { g(X, Y )| X}}


= E {E { g(X, Y )| Y }} (4.4.42)
   
from E {E { g(X, Y )| X}} = all x all y g(x, y) f Y |X ( y| x) d y f X (x)d x = all y
    
all x g(x, y) f X,Y (x, y)d x d y = all y all x g(x, y) f X|Y ( x| y) d x f Y ( y)d y.
The result (4.4.42) can be obtained also from Theorem 4.4.4. When X and Y are
 2
both one-dimensional, if we let g(X, Y ) = Y − m Y |X , we get the expected value
  
2
E { g (X, Y )| X = x} = E Y − m Y |X  X = x , (4.4.43)

which is called the conditional variance of Y when X = x.


   
Example 4.4.8 (Ross 1976) Obtain the conditional expected value E exp X Y = 1
2 
when the joint pdf of (X, Y ) is f X,Y (x, y) = 2y e−x y u(x)u(y)u(2 − y).
1 −x
Solution From (4.4.10), we have f X |Y (x|1) = f X,YfY (x,1)
e
=  ∞ 21 e−x d x = e−x for 0 <
  X  (1)  0 ∞ 2
x < ∞. Thus, from (4.4.41), we get E exp 2  Y = 1 = 0 e 2 e−x d x = 2. ♦
x

When g (X, Y ) = g 1 (X)g 2 (Y ), from (4.4.41) we get


     
E g 1 (X)g 2 (Y ) X = x = g 1 (x)E g 2 (Y ) X = x , (4.4.44)
308 4 Random Vectors

which is called the factorization property (Gardner 1990). The factorization property
implies that the random vector g 1 (X) under the condition X = x, or equivalently
g 1 (x), is not probabilistic.

4.4.3 Evaluation of Expected Values via Conditioning

As we have observed in Sects. 2.4 and 3.4.3, we can obtain the probability and
expected value more easily by first obtaining the conditional probability and condi-
tional expected value with appropriate conditioning. Let us now discuss how we can
obtain expected values for random vectors by first obtaining the conditional expected
value with appropriate conditioning on random vectors.

Example 4.4.9 Consider the group {1, 1, 2, 2, . . . , n, n} of n pairs of numbers.


When we randomly delete m numbers in the group, obtain the expected number
of pairs remaining. For n = 20 and m = 10, obtain the value of the expected num-
ber.

Solution Denote by Jm the number of pairs remaining after m numbers have been
deleted. After m numbers have been deleted, we have 2n − m − 2Jm non-paired
numbers. When we delete one more number, the number of pairs is Jm − 1 with
2Jm
probability 2n−m or Jm with probability 1 − 2n−m
2Jm
. Based on this observation, we
have
 
2Jm 2Jm
E { Jm+1 | Jm } = (Jm − 1) + Jm 1 −
2n − m 2n − m
2n − m − 2
= Jm . (4.4.45)
2n − m
     
Noting that E E Jm+1  Jm = E Jm+1 and E {J0 } = n , we get E {Jm+1 } =
E {Jm } 2n−m−2
2n−m
= E {Jm−1 } 2n−m−2 2n−m−1
2n−m 2n−m+1
= · · · = E {J0 } 2n−m−2 2n−m−1
2n−m 2n−m+1
· · · 2n−2
2n
=
(2n−m−2)(2n−m−1)
2(2n−1)
, i.e.,

(2n − m − 1)(2n − m)
E {Jm } =
2(2n − 1)
C
2n−2 m
=n . (4.4.46)
2n Cm

For n = 20 and m = 10, the value is E {J10 } = 29×30


2×39
≈ 11.15. ♦

Example 4.4.10 (Ross 1996) We toss a coin repeatedly until head appears r times
consecutively. When the probability of a head is p, obtain the expected value of
repetitions.
4.4 Conditional Distributions 309

Solution Denote by Ck the number of repetitions until the first appearance of k


consecutive heads. Then,

Ck + 1, Ck+1 -st outcome is head,
Ck+1 = (4.4.47)
Ck+1 + Ck + 1, Ck+1 -st outcome is tail.

Let αk = E {Ck } for convenience. Then, using (4.4.47) in

E {Ck+1 } = E { Ck+1 | Ck+1 -st outcome is head} P(head)


+E { Ck+1 | Ck+1 -st outcome is tail} P(tail), (4.4.48)

we get

αk+1 = (αk + 1) p + (αk+1 + αk + 1) (1 − p). (4.4.49)

Now, solving (4.4.49), we get αk+1 = 1p αk + 1p = p12 αk−1 + 1p + p12 = · · · =


 1
k+1
1
p k α1 + 1
p
+ 1
p 2 + · · · + 1
p k = pi
because α1 = E {C1 } = 1 × p + 2(1 − p) p +
i=1
3(1 − p)2 p + · · · = 1p . In other words,

k
1
αk = . (4.4.50)
i=1
pi

A generalization of this problem, finding the mean time for a pattern, is discussed in
Appendix 4.2. ♦

4.5 Impulse Functions and Random Vectors

As we have observed in Examples 2.5.23 and 3.1.34, the unit step and impulse
functions are quite useful in representing the cdf and pdf of discrete and hybrid
random variables. In addition, the unit step and impulse functions can be used for
obtaining joint cdf’s and joint pdf’s expressed in several formulas depending on the
condition.
Example 4.5.1 Obtain the joint cdf FX,X +a and joint pdf f X,X +a of X and Y =
X + a for a random variable X with pdf f X and cdf FX , where a is a constant.

Solution The joint cdf FX,X +a (x, y) = P(X ≤ x, X ≤ y − a) of X and Y = X + a


can be obtained as FX,X +a (x, y) = P(X ≤ x) for x ≤ y − a and FX,X +a (x, y) =
P(X ≤ y − a) for x ≥ y − a, i.e.,
310 4 Random Vectors

FX,X +a (x, y) = FX (min(x, y − a))


= FX (x)u(y − x − a) + FX (y − a)u(x − y + a), (4.5.1)

where it is assumed that u(0) = 21 . If we differentiate (4.5.1), we get the joint


∂2
pdf f X,X +a (x, y) = ∂ x∂ F (x, y) = ∂∂y { f X (x)u(y − x − a) − FX (x)δ(y − x−
y X,Y
a) + FX (y − a)δ(x − y + a)} of X and Y = X + a as18

f X,X +a (x, y) = f X (x)δ(y − x − a) (4.5.2)

by noting that δ(t) = δ(−t) as shown in (1.4.36) and that FX (x)δ(y − x −


a) = FX (y − a)δ(x − y + a) from FX (t)δ(t − a) = FX (a)δ(t − a) as observed in
(1.4.42). The result (4.5.2) can be written also as f X,X +a (x, y) = f X (y − a)δ(y −
x − a). Another derivation of (4.5.2) based on FX,X +a (x, y) = FX (min(x, y − a))
in (4.5.1) is discussed in Exercise 4.72. ♦

Example 4.5.2 Obtain the joint cdf FX,cX and the joint pdf f X,cX of X and Y = cX
for a continuous random variable X with pdf f X and cdf FX , where c is a constant.

Solution For c > 0, the joint cdf FX,cX (x, y) = P(X ≤ x, cX ≤ y) of X and Y =
cX can be obtained as

FX,cX (x, y) =
P(X ≤ x), x ≤ cy ,
P X ≤ c , x ≥ cy
y
y  y  y
= FX (x)u − x + FX u x− . (4.5.3)
c c c

For c < 0, the joint cdf is FX,cX (x, y) = P X ≤ x, X ≥ cy , i.e.,

0, x ≤ cy ,
FX,cX (x, y) =
P c ≤ X ≤ x , x > cy
y
  y   y
= FX (x) − FX u x− . (4.5.4)
c c

For c = 0, we have the joint cdf FX,cX (x, y) = P (X ≤ x, cX ≤ y) as



0, y < 0,
FX,cX (x, y) =
P (X ≤ x) , y≥0
= FX (x)u(y) (4.5.5)

with u(0) = 1. Collecting (4.5.3)–(4.5.5), we eventually have

∞ ∞ ∞
18 Here, −∞ −∞ f X (x)δ(y − x − a)dydx = −∞ f X (x)d x = 1.
4.5 Impulse Functions and Random Vectors 311
⎧   

⎪ FX (x) − FX cy u x − y
, c < 0,
⎨ c
FX (x)u(y),
y c = 0,
FX,cX (x, y) = (4.5.6)

⎪ F X (x)u c −x

+FX cy u x − cy , c > 0.

 joint pdf of X andY = cX by differentiating (4.5.6). First, rec-


We now obtain the
ollect that FX (x)δ x − cy = FX cy δ x − cy from δ(t) = δ(−t) and FX (t)δ(t −
∂2
a) = FX (a)δ(t − a). Then, for c < 0, the joint pdf f X,cX (x, y) = ∂ x∂ F
y X,cX
(x, y)
of X and Y = cX can be obtained as
∂   y   y   y 
f X,cX (x, y) = f X (x)u x − + FX (x) − FX δ x−
∂y c c c
1  y
= − f X (x)δ x − . (4.5.7)
c c
  
Similarly, we get f X,cX (x, y) = ∂∂y f X (x)u cy − x − FX (x)δ cy − x + FX
y  
c
δ x − cy , i.e.,

1 y 
f X,cX (x, y) = f X (x)δ −x (4.5.8)
c c
∂2
for c > 0, and f X,cX (x, y) = ∂ x∂ y
FX (x)u(y) = f X (x)δ(y) for c = 0: in short,

f X (x)δ(y),
 c = 0,
f X,cX (x, y) = (4.5.9)
1
f (x)δ x −
|c| X
y
c
, c = 0.
∞ ∞ ∞ ∞
Note that −∞ −∞ f X,cX (x, y)d yd x = −∞ f X (x)d x −∞ δ(y)dy = 1 for c = 0.
∞ ∞ ∞ ∞ 
For c = 0, we have −∞ −∞ f X,cX (x, y)d yd x = |c| 1
−∞ f X (x) −∞ δ x − c
y
∞   −∞
−∞ δ x − c dy = c ∞ δ(x − t)dt = −c  for  c < 0,
y
d yd x. From and
∞  ∞ ∞
δ x − y
dy
c 
= c δ(x − t)dt = c for c > 0, we have δ x − y
dy =
∞ ∞
−∞ −∞ −∞ c
|c|. Therefore, −∞ −∞ f X,cX (x, y)d yd x = 1. ♦

Letting a = 0 in Example 4.5.1, c = 1 in Example 4.5.2, or a = 0 and c = 1 in


Exercise 4.68, we get

f X,X (x, y) = f X (x)δ(x − y) (4.5.10)

or f X,X (x, y) = f X (x)δ(y − x) = f X (y)δ(x − y) = f X (y)δ(y − x). Let us now


consider the joint distribution of a random variable and its absolute value.

Example 4.5.3 Obtain the joint cdf FX,|X | and the joint pdf f X,|X | of X and Y = |X |
for a continuous random variable X with pdf f X and cdf FX .
312 4 Random Vectors

Solution First, the joint cdf FX,|X | (x, y) = P(X ≤ x, |X | ≤ y) of X and Y = |X |


can be obtained as (Bae et al. 2006)

⎨ P(−y ≤ X ≤ x), −y < x < y, y > 0,
FX,|X | (x, y) = P(−y ≤ X ≤ y), x > y, y > 0,

0, otherwise
= u(y + x)u(y − x)G 1 (x, y) + u(y)u(x − y)G 1 (y, y),
(4.5.11)

where

G 1 (x, y) = FX (x) − FX (−y) (4.5.12)

satisfies G 1 (x, −x) = 0. We have used u(y + x)u(y − x)u(y) = u(y − x)u(y + x)
in obtaining (4.5.11). Note that FX,|X | (x, y) = 0 for y < 0 because u(x + y)u(y −
x) = 0 from (x + y)(y − x) = y 2 − x 2 < 0.
Noting that

δ(y − a) f X (y) = δ(y − a) f X (a), (4.5.13)

G 1 (x, −x) = 0, δ(x − y) = δ(y − x), and u(αx) = u(x) for α > 0, we get


FX,|X | (x, y) = δ(x + y)u(y − x)G 1 (x, y) − u(y + x)δ(y − x)G 1 (x, y)
∂x
+u(x + y)u(y − x) f X (x) + u(y)δ(x − y)G 1 (y, y)
= δ(x + y)u(−2x)G 1 (x, −x) − u(2x)δ(y − x)G 1 (x, x)
+u(x + y)u(y − x) f X (x) + u(x)δ(x − y)G 1 (x, x)
= u(x + y)u(y − x) f X (x) (4.5.14)

by differentiating (4.5.11) with respect to x. Consequently, the joint pdf


∂2
f X,|X | (x, y) = ∂ x∂ F
y X,|X |
(x, y) of X and Y = |X | is

f X,|X | (x, y) = {δ(x + y)u(y − x) + u(x + y)δ(y − x)} f X (x)


= {δ(x + y)u(−x) + δ(x − y)u(x)} f X (x)
= {δ(x + y) f X (−y) + δ(x − y) f X (y)} u(y). (4.5.15)
∞
Obtaining the pdf f |X | (y) = −∞ f X,|X | (x, y)d x of |X | from (4.5.15), we get


f |X | (y) = {δ(x + y) f X (−y) + δ(x − y) f X (y)} u(y)d x
−∞
= { f X (−y) + f X (y)} u(y), (4.5.16)

which is equivalent
∞ to that obtained in Example 3.2.7. From u(x) + u(−x) = 1, we
also have −∞ f X,|X | (x, y)dy = f X (x)u(−x) + f X (x)u(x) = f X (x). ♦
4.5 Impulse Functions and Random Vectors 313

Example 4.5.4 Using (4.5.16), it is easy to see that Y = |X | ∼ U (0, 1) when X ∼


U (−1, 1), X ∼ U (−1, 0), or X ∼ U [0, 1). ♦

When the Jacobian J (g(x)) is 0, Theorem 4.2.1 is not applicable as mentioned


in the paragraph just before Sect. 4.2.1.2. We have J (g(x)) = 0 if a function g =
(g1 , g2 , . . . , gn ) in the n-dimensional space is in the form, for example, of

g j (x) = c j g1 (x) + a j (4.5.17)


 n  n
for j = 2, 3, . . . , n, where a j j=2 and c j j=2 are all constants. Let us discuss how
we can obtain the pdf of (Y1 , Y2 ) = g(X) = (g1 (X), g2 (X)) in the two dimensional
case, where

g2 (x) = cg1 (x) + a (4.5.18)

with a and c constants. First, we choose an appropriate auxiliary variable, for exam-
ple, Z = X 2 or Z = X 1 when g1 (X) is not or is, respectively, in the form of d X 2 + b.
Then, using Theorem 4.2.1, we obtain the joint pdf f Y1 ,Z of (Y1 , Z ). We next obtain
the pdf of Y1 as

f Y1 (x) = f Y1 ,Z (x, v)dv (4.5.19)
−∞

from f Y1 ,Z . Subsequently, we get the joint pdf



f Y1 (x)δ(y − a), c = 0,
f Y1 ,Y2 (x, y) = (4.5.20)
1
f (x)δ x − y−a
|c| Y1 c
, c = 0

of (Y1 , Y2 ) = (g1 (X), cg1 (X) + a) using (4.E.18).

Example 4.5.5 Obtain the joint pdf of (Y1 , Y2 ) = (X 1 − X 2 , 2X 1 − 2X 2 ) and the


joint pdf of (Y1 , Y3 ) = (X 1 − X 2 , X 1 − X 2 + 2) when the joint pdf of X = (X 1 , X 2 )
is f X (x1 , x2 ) = u (x1 ) u (1 − x1 ) u (x2 ) u (1 − x2 ).

Solution Because g1 (X) = Y1 = X 1 − X 2 is not in the form of d X 2 + b, we


choose the auxiliary variable Z = X 2 . Let us then obtain the joint pdf of
(Y1 , Z ) = (X 1 − X 2 , X 2 ). Noting that X 1 = Y1 + Z , X 2 = Z , and the Jacobian
   1 −1 
J ∂(x1 −x2 ,x2 )
=  = 1, the joint pdf of (Y1 , Z ) is
∂(x1 ,x2 ) 0 1 

f Y1 ,Z (y, z) = f X (x1 , x2 )|x1 =y+z, x2 =z


= u(y + z)u(1 − y − z)u(z)u(1 − z). (4.5.21)
314 4 Random Vectors

x2 z

1 1

VX
VY

0 1 x1 −1 0 1 y

Fig. 4.17 The support VY of f X 1 −X 2 ,X 2 (y, z) when the support of f X (x1 , x2 ) is VX . The intervals
of integrations in the cases −1 < y < 0 and 0 < y < 1 are also represented as lines with two arrows

∞
We can then obtain the pdf f Y1 (y) = −∞ u(y + z)u(1 − y − z)u(z)u(1 − z)dz of
Y1 = g1 (X) = X 1 − X 2 as


⎨  0, |y| > 1,
1
f Y1 (y) = −y dz, −1 < y < 0,
⎩  1−y dz, 0 < y < 1

⎧ 0
⎨ 0, |y| > 1,
= 1 + y, −1 < y < 0,

1 − y, 0 < y < 1
= (1 − |y|)u(1 − |y|) (4.5.22)

by integrating f Y1 ,Z (y, z), for which Fig. 4.17 is useful in identifying the integration
intervals.
Next, using (4.5.20), we get the joint pdf

1  y
f Y1 ,Y2 (x, y) = (1 − |x|)u(1 − |x|)δ x − (4.5.23)
2 2
of (Y1 , Y2 ) = (X 1 − X 2 , 2X 1 − 2X 2 ) and the joint pdf

f Y1 ,Y3 (x, y) = (1 − |x|)u(1 − |x|)δ(x − y + 2) (4.5.24)


∞ 
of (Y , Y ) = (X 1 − X 2 , X 1 − X 2 + 2). Note that δ x− y
dy =
 −∞ 1 3 ∞ −∞ 2

∞ δ(t)(−2dt) = 2 −∞ δ(t)dt = 2. ♦

We have briefly discussed how we can obtain the joint pdf and joint cdf in some
special cases by employing the unit step and impulse functions. This approach is
also quite fruitful in dealing with the order statistics and rank statistics.
Appendices 315

Appendices

Appendix 4.1 Multinomial Random Variables

Let us discuss in more detail the multinomial random variables introduced in Exam-
ple 4.1.4.
Definition 4.A.1 (multinomial distribution) Assume n repetitions of an independent
experiment of which the outcomes are a collection {Ai }ri=1 of disjoint events with

r
probability {P ( Ai ) = pi }ri=1 , where pi = 1. Denote by X i the number of occur-
i=1
rences of event Ai . Then, the joint distribution of X = (X 1 , X 2 , . . . , X r ) is called
the multinomial distribution, and the joint pmf of X is

n!
p X (k1 , k2 , . . . , kr ) = p k1 p k2 · · · prkr (4.A.1)
k1 !k2 ! · · · kr ! 1 2


r
for {ki ∈ {0, 1, . . . , n}}ri=1 and ki = n.
i=1
r
k
The right-hand side of (4.A.1) is the coefficient of t j j in the multinomial
j=1
expansion of ( p1 t1 + p2 t2 + · · · + pr tr )n .

Example 4.A.1 In a repetition of rolling of a fair die ten times, let {X i }i=1
3
be the
numbers of A1 = {1}, A2 = {an even number}, and A3 = {3, 5}, respectively. Then,
the joint pmf of X = (X 1 , X 2 , X 3 ) is
 k1  k2  k3
10! 1 1 1
p X (k1 , k2 , k3 ) = (4.A.2)
k1 !k2 !k3 ! 6 2 3


3
for {ki ∈ {0, 1, . . . , 10}}i=1
3
such that ki = 10. Based on this pmf, the prob-
i=1
ability of the event {three times of A1 , six times of A2 } = {X 1 = 3, X 2 = 6, X 3 =
1 3 1 6 1 1
1} can be obtained as p X (3, 6, 1) = 3!6!1!
10!
6 2 3
= 1728
35
≈ 2.025 × 10−2 . ♦

Example 4.A.2 As in the binomial distribution, let us consider the approx-


imation of the multinomial distribution in terms of the Poisson distribu-
tion. For pi → 0 and npi → λi when n → ∞, we have kn!r ! =  r n! 
−1 =
n− ki !
i=1
 r
 r
−1
r
 r −1 
−1 ki −1 
n(n − 1) · · · n − ki + 1 ≈ n i=1 , pr = 1 − pi ≈ exp − pi , and
i=1 i=1 i=1
r
−1
   r −1   r −1 
n− ki r
−1   
prkr = pr i=1
≈ exp − n − ki pi ≈ exp −n pi = exp −λ ,
i=1 i=1 i=1
316 4 Random Vectors

r
−1
where λ = λi . Based on these results, we can show that p X (k1 , k2 , . . . , kr ) =
i=1
k1 k2 k1 k2
···n kr −1 k1 k2 kr −1 
n!
p
k1 !k2 !···kr ! 1 2
p · · · prkr → nk1 !kn 2 !···k r −1 !
p1 p2 · · · pr −1 exp −λ , i.e.,

r −1
 λiki
p X (k1 , k2 , . . . , kr ) → e−λi . (4.A.3)
i=1
ki !

The result (4.A.3) with r = 2 is clearly the same as (3.5.19) obtained in Theorem
3.5.2 for the binomial distribution. ♦
√
Example 4.A.3 For ki = npi + O n and n → ∞, the multinomial pmf can be
approximated as

n!
p X (k1 , k2 , . . . , kr ) = p k1 p k2 · · · prkr
k1 !k2 ! · · · kr ! 1 2
 
1  (ki − npi )2
r
1
≈  exp − . (4.A.4)
(2π n)r −1 p1 p2 · · · pr 2 i=1 npi

Consider the case of r = 2 in (4.A.4). Letting k1 = k, k2 = n − k, p1 = p, and


p2 = 1 − p1 = q, we get p X (k1 , k2 ) = p X (k, n − k) as
 !
1 1 (k − np)2 (n − k − nq)2
p X (k1 , k2 ) ≈ √ exp − +
2π npq 2 np nq
 
1 1 q(k − np)2 + p(np − k)2
= √ exp −
2π npq 2 npq
 
1 (k − np)2
= √ exp − , (4.A.5)
2π npq 2npq

which is the same as (3.5.16) of Theorem 3.5.1 for the binomial distribution. ♦

The multinomial distribution (Johnson and Kotz 1972) is a generalization of the


binomial distribution, and the special case r = 2 of the multinomial distribution is the
binomial distribution. For the multinomial random vector X = (X 1 , X 2 , . . . , X r ),
the marginal distribution of X i is a binomial distribution. In addition, assuming
s
n− X ai as the (s + 1)-st random variable, the distribution of the subvector X s =
 i=1
X a1 , X a2 , . . . , X as of X is also a multinomial distribution with the joint pmf

 n− s kai
s
1−
i=1
pai  ka
 i=1
s
pai i
p X s k a1 , k a2 , . . . , k as = n!   . (4.A.6)
s k !
n− kai ! i=1 ai
i=1
Appendices 317

Letting s = 1 in (4.A.6), we get the binomial pmf p X (ka ) = n! (1− pa ) n−ka p ka


a
(n−ka )! ka !
.
In addition, when a subvector of X is given, the conditional joint distribution of the
random vector of the remaining random variables is also a multinomial distribution,
which depends not on the individual remaining random variables but on the sum of the
remaining random variables. For example, assume X = (X 1 , X 2 , X 3 , X 4 ). Then, the
joint distribution of (X 2 , X 4 ) when (X 1 , X 3 ) is given is a multinomial distribution,
which depends not on X 1 and X 3 individually but on the sum X 1 + X 3 .
Finally, when X = (X 1 , X 2 , . . . , X r ) has the pmf (4.A.1), it is known that
⎛ ⎞
r −1

  pi
E X i | X b1 , X b2 , . . . , X br −1 = ⎝n − Xbj ⎠ , (4.A.7)
r
−1
1− pb j j=1
j=1

 r −1
where i is not equal to any of b j j=1 . It is also known that we have the conditional
expected value

  n − X j pi
E Xi | X j = , (4.A.8)
1 − pj

the correlation coefficient


,
pi p j
ρ X i ,X j = −  , (4.A.9)
(1 − pi ) 1 − p j

and Cov X i , X j = −npi p j for X i and X j with i = j.

Appendix 4.2 Mean Time to Pattern

Denote by X k the outcome of the k-th trial of an experiment with the pmf

p X k ( j) = p j (4.A.10)



for j = 1, 2, . . ., where p j = 1. The number of trials of the experiment until
j=1
a pattern M = (i 1 , i 2 , . . . , i n ) is observed for the first time is called the time to
pattern M, which is denoted by T = T (M) = T (i 1 , i 2 , . . . , i n ). For example, when
the sequence of the outcomes is (6, 4, 9, 5, 5, 9, 5, 7, 3, 2, . . .), the time to pattern
(9, 5, 7) is T (9, 5, 7) = 8. Now, let us obtain (Nielsen 1973) the mean time E{T (M)}
for the pattern M.
First, when M satisfies
318 4 Random Vectors

(i 1 , i 2 , . . . , i k ) = (i n−k+1 , i n−k+2 , . . . , i n ) (4.A.11)

for n = 2, 3, . . . and k = 1, 2, . . . , n − 1, the pattern M overlaps and L k =


(i 1 , i 2 , . . . , i k ) is an overlapping piece or a bifix of M. For instance, (A, B, C),
(D, E, F, G), (S, S, P), (4, 4, 5), and (4, 1, 3, 3, 2) are non-overlapping patterns;
and (A, B, G, A, B), (9, 9, 2, 4, 9, 9), (3, 4, 3), (5, 4, 5, 4, 5), and (5, 4, 5, 4, 5, 4)
are overlapping patterns. Note that the length k of an overlapping piece can be
longer than n2 and that more than one overlapping pieces may exist in a pattern
as in (5, 4, 5, 4, 5) and (5, 4, 5, 4, 5, 4). In addition, when the overlapping piece is
of length k, the elements in the pattern are the same at every other n − k − 1: for
instance, M = (i 1 , i 1 , . . . , i 1 ) when k = n − 1. A non-overlapping pattern can be
regarded as an overlapping pattern with k = n.

(A) A Recursive Method


∞ 
∞  k−1
First, the mean time E{T (i 1 )} = kP(T (i 1 ) = k) = k 1 − pi1 pi1 to pat-
k=1 k=1
tern i 1 of length 1 is

1
E{T (i 1 )} = . (4.A.12)
pi1

When M has J overlapping pieces, let the lengths of the overlapping pieces be
K 0 < K 1 < · · · < K J < K J +1 with K 0 = 0 and K J +1 = n, and express M as M =
i 1 , i 2 , . . . , i K 1 , i K 1 +1 , . . . , i K 2 , i K 2 +1 , . . . , i K J , i K J +1 , . . . , i n−1 , i n . If we write the
  J
overlapping pieces L K j = i 1 , i 2 , . . . , i K j j=1 as

i1 i2 · · · i K1
i1 i2 · · · i K 2,1 +1 i K 2,1 +2 · · · i K 2
.. .. .. .. .. (4.A.13)
. . . . .
i 1 i 2 · · · i K J,2 +1 i K J,2 +2 · · · i K J,1 +1 i K J,1 +2 · · · i K J ,

where K α,β = K α − K β , then we have i m = i K b,a +m for 1 ≤ a ≤ b ≤ J and m =


1, 2, . . . , K a − K a−1 because the values at the same column in (4.A.13) are all the
same.
Denote by T (A1 ) the time to wait until the occurrence of M+1 = (i 1 , i 2 , . . . ,
i n , i n+1 ) after the occurrence of M. Then, we have

E {T (M+1 )} = E{T (M)} + E {T (A1 )} (4.A.14)

because T (M+1 ) = T (M) + T (A1 ). Here, we can express E {T (A1 )} as


Appendices 319



E {T (A1 )} = E {T (A1 )| X n+1 = x} P (X n+1 = x) . (4.A.15)
x=1

Let us focus on the term E {T (A1 )| X n+1 = x} in (4.A.15). First, when x = i K j +1



for example, denote by L̃ K j +1 = i 1 , i 2 , . . . , i K j , i K j +1 the j-th overlapping piece
with its immediate next element, and recollect (4.A.11). Then, we have

M+1 = i 1 , i 2 , . . . , i n−K j , i n−K j +1 , i n−K j +2 , . . . , i n , i K j +1

= i 1 , i 2 , . . . , i n−K j , i 1 , i 2 , . . . , i K j , i K j +1 , (4.A.16)

     

from which we can get E T (A1 )  X n+1 = i K j +1 = 1 + E T (M+1 )  L̃ K j +1 . We
can similarly get
⎧   
⎪ 

⎪ 1 + E T (M +1 )  L̃ K +1 , x = i K 0 +1 ,

⎪   0


⎪ 

⎪ 1 + E T (M+1 )  L̃ K 1 +1 , x = i K 1 +1 ,

⎨ .. ..
E {T (A1 )| X n+1 = x} = .   . (4.A.17)

⎪ 

⎪ 1 + E T (M+1 )  L̃ K J +1 , x = i K J +1 ,





⎪ 1, x = i n+1 ,

1 + E {T (M+1 )} , otherwise

when i 1 = i K 0 +1 , i K 1 +1 , . . ., i K J +1 , i K J +1 +1 = i n+1 are all distinct. Here, recollecting


     

E T (M+1 )  L̃ K j +1 = E {T (M+1 )} − E T L̃ K j +1 (4.A.18)
     

from E {T (M+1 )} = E T (M+1 )  L̃ K j +1 + E T L̃ K j +1 , we get

E {T (M+1 )} = E{T (M)}


J #   $
+ pi K j +1 1 + E {T (M+1 )} − E T L̃ K j +1
j=0
⎛ ⎞

J
+ pin+1 × 1 + ⎝1 − pin+1 − pi K j +1 ⎠
j=0
 "
× 1 + E {T (M+1 )} (4.A.19)

from (4.A.14), (4.A.15), and (4.A.17). We can rewrite (4.A.19) as



J   
pin+1 E {T (M+1 )} = E{T (M)} + 1 − pi K j +1 E T L̃ K j +1 (4.A.20)
j=0

after some steps.


320 4 Random Vectors

Let us next consider the case in which some are the same among i K 0 +1 , i K 1 +1 ,
. . ., i K J +1 , and i n+1 . For example, assume a < b and i K a +1 = i K b +1 . Then, for
 x=

i K a +1 = i K b +1 in (4.A.15) and (4.A.17), the line ‘1 + E T (M+1 )  L̃ K a +1 , x =
i K a +1 ’ corresponding to the K a -th piece among the lines of (4.A.17) will disappear
because the longest overlapping piece in the last part of M+1 is not L̃ K a +1 but L̃ K b +1 ,
Based on this fact, if we follow steps similar to those leading to (4.A.19) and (4.A.20),
we get
   
pin+1 E {T (M+1 )} = E{T (M)} + 1 − pi K j +1 E T L̃ K j +1 , (4.A.21)
j

   
where denotes the sum from j = 0 to J letting all E T L̃ K a +1 to 0 when
j
 J
i K a +1 = i K b +1 for 0 ≤ a < b ≤ J + 1. Note here that K j j=1 are the lengths of the
overlapping pieces of M, not of M+1 . Note also that (4.A.20) is a special case of
(4.A.21): in other words, (4.A.21) is always applicable.
In essence, starting from E {T (i 1 )} = p1i shown in (4.A.12), we can successively
1
obtain E {T (i 1 , i 2 )}, E {T (i 1 , i 2 , i 3 )}, . . ., E{T (M)} based on (4.A.21).

Example 4.A.4 For an i.i.d. random variables {X k }∞ k=1 with the marginal pmf
p X k ( j) = p j , obtain the mean time to M = (5, 4, 5, 3).

Solution First, E{T (5)} = 1


p5
. When (5, 4) is M+1 , because J = 0, i K 0 +1 = 5, and
i n+1 = 4, we get
      
pi K j +1 E T L̃ K j +1 = pi1 E T L̃ 1 , (4.A.22)
j

 "
i.e., E{T (5, 4)} = 1
E{T (5)} + 1 − p5 E{T (5)} = p41p5 . Next, when (5, 4, 5) is
p4
   
M+1 , because J = 0 and i K 0 +1 = 5 = i n+1 , we get pi K j +1 E T L̃ K j +1 = 0.
 " j
Thus, E{T (5, 4, 5)} = p15 E{T (5, 4)} + 1 = p 1p2 + p15 . Finally, when (5, 4, 5, 3)
4 5
is M+1 , because J = 1 and K 1 = 1, we have i K 0 +1 = i 1 = 5, i K 1 +1 = i 2 = 4,
i K J +1 +1 = i 4 = 3, and
   
pi K j +1 E T L̃ K j +1 = p5 E{T (5)} + p4 E{T (5, 4)}, (4.A.23)
j

 "
i.e., E{T (5, 4, 5, 3)} = 1
p3
E{T (5, 4, 5)} + 1 − p5 E{T (5)} − p4 E{T (5, 4)} =
1
p3 p4 p52
. ♦
Appendices 321

(B) An Efficient Method

The result (4.A.21) is applicable always. However, as we have observed in Example


4.A.4, (4.A.21) possesses some inefficiency in the sense that we have to first obtain
the expected values E {(i 1 )}, E {(i 1 , i 2 )}, . . ., E {(i 1 , i 2 , . . . , i n−1 )} before we can
obtain the expected value E {(i 1 , i 2 , . . . , i n )}. Let us now consider a more efficient
method.

(B-1) Non-overlapping Patterns

When the pattern M is non-overlapping, we have

(i 1 , i 2 , . . . , i k ) = (i n−k+1 , i n−k+2 , . . . , i n ) (4.A.24)

for every k ∈ {1, 2, . . . , n − 1}. Based on this observation, let us show that
  
{T = j + n}  T > j, X j+1 , X j+2 , . . . , X j+n = M . (4.A.25)

First, when T = j + n, the first occurrence of M is X j+1 , X j+2 , . . . , X j+n , which
implies that T > j and

X j+1 , X j+2 , . . . , X j+n = M. (4.A.26)

Next, let us show that T = j + n when T > j and (4.A.26) holds true.
If k ∈ {1, 2, . . . , n − 1} and T = j + k, then we have  X j+k = i n , X j+k−1 =
i n−1 , . . . , X j+1 = i n−k+1 . This is a contradiction to X j+1 , X j+2 , . . . , X j+n =
(i 1 , i 2 , . . . , i k ) = (i n−k+1 , i n−k+2 , . . . , i n ) implied by (4.A.24) and (4.A.26). In short,
for any value k in {1, 2, . . . , n − 1}, we have T = j + k and thus T ≥ j + n. Mean-
while, (4.A.26) implies T ≤ j + n. Thus, we get T = j + n.
From (4.A.25), we have
 
P(T = j + n) = P T > j, X j+1 , X j+2 , . . . , X j+n = M . (4.A.27)

Here, the event T > j is dependent only on X 1 , X 2 , . . . , X j but not on


X j+1 , X j+2 , . . . , X j+n , and thus

P(T = j + n) = P(T > j) P X j+1 , X j+2 , . . . , X j+n = M
= P(T > j) p̂, (4.A.28)



where p̂ = pi1 pi2 · · · pin . Now, recollecting that P(T = j + n) = 1 and that
j=0

∞  "
P(T > j) = P(T > 0) + P(T > 1) + · · · = P(T = 1) + P(T = 2) + · · · +
j=0
322 4 Random Vectors

 " 

P(T = 2) + P(T = 3) + · · · + · · · = j P(T = j), i.e.,
j=0



P(T > j) = E{T }, (4.A.29)
j=0



we get p̂ P(T > j) = p̂E{T } = 1 from (4.A.28). Thus, we have E{T (M)} = 1p̂ ,
j=0
i.e.,

1
E{T (M)} = . (4.A.30)
pi1 pi2 · · · pin

Example 4.A.5 For the pattern M = (9, 5, 7), we have E{T (9, 5, 7)} = p5 p17 p9 .
Thus, to observe the pattern (9, 5, 7), we have to wait on the average until the p5 p17 p9 -
th repetition. In tossing a fair die, we need to repeat E{T (3, 5)} = p31p5 = 36 times
on the average to observe the pattern (3, 5) for the first time. ♦

(B-2) Overlapping Patterns

We next consider overlapping patterns. When M is an overlapping pattern, construct


a non-overlapping pattern

Mx = (i 1 , i 2 , . . . , i n , x) (4.A.31)

 length n + 1 by appropriately
of 
19
choosing x as x ∈/ {i 1 , i 2 , . . . , i n } or x ∈
/
i K 0 +1 , i K 1 +1 , . . . , i K J +1 . Then, from (4.A.30), we have

1
E {T (Mx )} = . (4.A.32)
px p̂

When x = i n+1 , using (4.A.32) in (4.A.21), we get

1    
E{T (M)} = − 1 + pi K j +1 E T L̃ K j +1 (4.A.33)
p̂ j

by noting that Mx in (4.A.31) and M+1 in (4.A.21) are the same. Now, if
we consider the case in which M is not an overlapping pattern, the last

19More generally, we can interpret ‘appropriate x’ as ‘any x such that (i 1 , i 2 , . . . , i n , x) is a non-


overlapping pattern’, and x can be chosen even if it is not a realization of any X k . For example,
when p1 + p2 + p3 = 1, we could choose x = 7.
Appendices 323

      
term of (4.A.33) becomes pi K j +1 E T L̃ K j +1 = pi K0 +1 E T L̃ K 0 +1 =
j
pi1 E {T (i 1 )} = 1. Consequently, (4.A.33) and (4.A.30) are the same. Thus, for any
overlapping or non-overlapping pattern M, we can use (4.A.33) to obtain E{T (M)}.

Example 4.A.6 In the pattern (9, 5, 1, 9, 5), we have J = 1, K 1 = 2, and


L̃ K 1 +1 = (9, 5,
 1). Thus, from (4.A.30) and "(4.A.33), we get E{T (9, 5, 1, 9, 5)} =
1
p p2 p2
− 1 + p 9 E{T (9)} + p 1 E{T (9, 5, 1)} = 1
p p2 p2
+ p51p9 . Similarly, in the pat-
1 5 9 1 5 9

tern (9, 5, 9, 1, 9, 5, 9), we get J = 2, K 1 = 1, K 2 = 3, and L̃ K 1 +1 = (9, 5) and


L̃ K 2 +1 = (9, 5, 9, 1). Therefore,

1 
E{T (9, 5, 9, 1, 9, 5, 9)} = 2 4
− 1 + p9 E{T (9)} + p5 E{T (9, 5)}
p1 p5 p9
"
+ p1 E{T (9, 5, 9, 1)}
1 1 1
= 2 4
+ 2
+ (4.A.34)
p1 p5 p9 p5 p9 p9

from (4.A.30) and (4.A.33). ♦

Comparing Examples 4.A.4 and 4.A.6, it is easy to see that we can obtain E{T (M)}
faster from (4.A.30) and (4.A.33) than from (4.A.21).

Theorem 4.A.1 For a pattern M = (i 1 , i 2 , . . . , i n ) with J overlapping pieces, the


mean time to M can be obtained as
J +1
 1
E{T (M)} = , (4.A.35)
j=1
pi1 pi2 · · · pi K j

where K 1 < K 2 < · · · < K J are the lengths of the overlapping pieces with K J +1 =
n.
     
Proof For convenience, let α j = pi K j +1 E T L̃ K j +1 and β j = E T L K j . Also
let

⎨ 1, if i K j +1 = i K m +1 for every value of
j = m ∈ { j + 1, j + 2, . . . , J }, (4.A.36)

0, otherwise

for j = 0, 1, . . . , J − 1, and  J = 1 by noting that the term with j = J is always


added in the sum in the right-hand side of (4.A.33). Then, we can rewrite (4.A.33)
as

1 J
E{T (M)} = − 1 + αjj. (4.A.37)
pi1 pi2 · · · pin j=0
324 4 Random Vectors


j−1
Now, α0 = pi1 E {T (i 1 )} = 1 and α j = β j + 1 − αl l for j = 1, 2, . . . , J
l=0
 J
from (4.A.21). Solving for α j j=1 , we get α1 = β1 + 1 − 0 , α2 = β2 + 1 −
(1 α1 + 0 α0 ) = β2 − 1 β1 + (1 − 0 ) (1 − 1 ), α3 = β3 + 1 − (2 α2 + 1 α1 +
0 α0 ) = β3 − 2 β2 − 1 (1 − 2 ) β1 + (1 − 0 ) (1 − 1 ) (1 − 2 ), . . ., and

α J = β J −  J −1 β J −1 −  J −2 (1 −  J −1 ) β J −2 − · · ·
− 1 (1 − 2 ) (1 − 3 ) · · · (1 −  J −1 ) β1
+ (1 − 0 ) (1 − 1 ) · · · (1 −  J −1 ) . (4.A.38)

Therefore,


J
α j  j = β J + ( J −1 −  J −1 ) β J −1 + { J −2 −  J −2  J −1
j=0
− J −2 (1 −  J −1 )} β J −2 + · · · + {1 − 1 2 − 1 (1 − 2 ) 3
− · · · − 1 (1 − 2 ) (1 − 3 ) · · · (1 −  J −1 )} β1 + {0 + (1 − 0 ) 1
+ (1 − 0 ) (1 − 1 ) 2 + · · ·
+ (1 − 0 ) (1 − 1 ) · · · (1 −  J −1 )} . (4.A.39)

In the right-hand side of (4.A.39), the second, third, . . ., second last terms are all 0,
and the last term is

0 + (1 − 0 ) 1 + (1 − 0 ) (1 − 1 ) 2 + · · ·
+ (1 − 0 ) (1 − 1 ) · · · (1 −  J −3 )  J −2
+ (1 − 0 ) (1 − 1 ) · · · (1 −  J −2 )  J −1 + (1 − 0 ) (1 − 1 ) · · · (1 −  J −1 )
= 0 + (1 − 0 ) 1 + (1 − 0 ) (1 − 1 ) 2 + · · ·
+ (1 − 0 ) (1 − 1 ) · · · (1 −  J −3 )  J −2 + (1 − 0 ) (1 − 1 ) · · · (1 −  J −2 )
..
.
= 0 + (1 − 0 ) 1 + (1 − 0 ) (1 − 1 )
= 1. (4.A.40)

Thus, noting (4.A.40) and using (4.A.39) into (4.A.37), we get E{T (M)} =
1
pi pi ··· pi
− 1 + β J + 1, i.e.,
1 2 n

1   
E{T (M)} = + E T L KJ . (4.A.41)
pi1 pi2 · · · pin
  
Next, if we obtain E T L K J after some steps similar to those for (4.A.41) by
recollecting that the overlapping pieces of L K J are L K 1 , L K 2 , . . . , L K J −1 , we have
Appendices 325
     
E T L KJ = 1
pi1 pi2 ··· pi K
+ E T L K J −1 . Repeating this procedure, and noting
J
that L 1 is not an overlapping piece, we get (4.A.35) by using (4.A.30). ♠
1+ p42 p5
Example 4.A.7 Using (4.A.35), it is easy to get E{T (5, 4, 4, 5)} = p42 p52
,
E{T (5, 4, 5, 4)} = 1+ p4 p5
p42 p52
, E{T (5, 4, 5, 4, 5)} = 1
p42 p53
+ 1
p4 p52
+ 1
p5
, and E{T (5, 4,
4, 5, 4, 4, 5)} = 1
p44 p53
+ p2 p2 + p15 .
1

4 5

Example 4.A.8 Assume a coin with P(h) = p = 1 − P(t), where h and t denote
head and tail, respectively. Then, the expected numbers of tosses until the first
occurrences of h, tht, htht, ht, hh, and hthhthh are E{T (h)} = 1p , E{T (tht)} =
1
pq 2
+ q1 , E{T (htht)} = p21q 2 + pq
1
, E{T (hthh)} = p13 q + 1p , and E{T (hthhthh)} =
1
p5 q 2
+ p13 q + 1p , respectively, where q = 1 − p. ♦

Exercises

Exercise 4.1 Show that

μe−μx (μx)n−1
f (x) = u(x) (4.E.1)
(n − 1)!

is the pdf of the sum of n i.i.d. exponential random variables with rate μ.

Exercise 4.2 A box contains three red and two green balls. We choose a ball from
the box, discard it, and choose another ball from the box. Let X = 1 and X = 2 when
the first ball is red and green, respectively, and Y = 4 and Y = 3 when the second
ball is red and green, respectively. Obtain the pmf p X of X , pmf pY of Y , joint pmf
p X,Y of X and Y , conditional pmf pY |X of Y given X , conditional pmf p X |Y of X
given Y , and pmf p X +Y of X + Y .

Exercise 4.3 For two i.i.d. random variables X 1 and X 2 with marginal distribution
P(1) = P(−1) = 0.5, let X 3 = X 1 X 2 . Are X 1 , X 2 , and X 3 pairwise independent?
Are they independent?

Exercise
  When  the joint pdf of a random vector (X, Y ) is f X,Y (x, y) =
4.4
a 1 + x y x 2 − y 2 u(1 − |x|)u (1 − |y|), determine the constant a. Are X and
Y independent of each other? If not, obtain the correlation coefficient between X
and Y .

Exercise 4.5 A box contains three red, six green, and five blue balls. A ball is chosen
randomly from the box and then replaced to the box after the color is recorded. After
six trials, let the numbers of red and blue be R and B, respectively. Obtain the
conditional pmf p R|B=3 of R when B = 3 and conditional mean E {R|B = 1} of R
when B = 1.
326 4 Random Vectors

Exercise 4.6 Two binomial random variables X 1 ∼ b (n 1 , p) and X 2 ∼ b (n 2 , p)


are independent of each other. Show that, when X 1 + X 2 = x is given, the conditional
distribution of X 1 is a hypergeometric distribution.

Exercise 4.7 Show that Z = X


X +Y
∼ U (0, 1) for two i.i.d. exponential random vari-
ables X and Y .

Exercise 4.8 When the joint pdf of X = (X 1 , X 2 ) is


       
1 1 1 1
f X (x1 , x2 ) = u x1 + u − x1 u x2 + u − x2 , (4.E.2)
2 2 2 2

obtain the joint pdf f Y of Y = (Y1 , Y2 ) = X 12 , X 1 + X 2 . Based on the joint pdf f Y ,
obtain the pdf f Y1 of Y1 = X 12 and pdf f Y2 of Y2 = X 1 + X 2 .

Exercise 4.9 When the joint pdf of X 1 and X 2 is f X 1 ,X 2 (x, y) = 41 u(1 − |x|)u(1 −
%
|y|), obtain the cdf FW and pdf f W of W = X 12 + X 22 .

Exercise 4.10 Two random variables X and Y are independent of each other with the
pdf’s f X (x) = λe−λx u(x) and f Y (y) = μe−μy u(y), where λ > 0 and μ > 0. When
W = min(X, Y ) and

1, if X ≤ Y,
V = (4.E.3)
0, if X > Y,

obtain the joint cdf of (W, V ).

Exercise 4.11 Obtain the pdf of U = X + Y + Z when the joint pdf of X , Y , and
Z is f X,Y,Z (x, y, z) = 6u(x)u(y)u(z)
(1+x+y+z)4
.

Exercise 4.12 Consider the two joint pdf’s (1) f X (x) =


u (x1 ) u (1 − x1 ) u (x2 ) u (1 − x2 ) and (2) f X (x) = 2u (x1 ) u (1 − x2 ) u (x2 − x1 )
of X = (X 1 , X 2 ), where  x = (x1 , x2 ). In each of the two cases, obtain the joint pdf
f Y of Y = (Y1 , Y2 ) = X 12 , X 1 + X 2 , and then, obtain the pdf f Y1 of Y1 = X 12 and
pdf f Y2 of Y2 = X 1 + X 2 based on f Y .

Exercise 4.13 In each of the two cases of the joint   pdf f X described
 in Exercise
4.12, obtain the joint pdf f Y of Y = (Y1 , Y2 ) = 21 X 12 + X 2 , 21 X 12 − X 2 , and
then, obtain the pdf f Y1 of Y1 and pdf f Y2 of Y2 based on f Y .

Exercise 4.14 Two random variables X ∼ G (α1 , β) and Y ∼ G (α2 , β) are inde-
pendent of each other. Show that Z = X + Y and W = YX are independent of each
other and obtain the pdf of Z and pdf of W .

Exercise 4.15 Denote the joint pdf of X = (X 1 , X 2 ) by f X .


 r
(1) Express the pdf of Y1 = X 12 + X 22 in terms of f X .
Exercises 327

(2) When f X (x, y) = π1 u 1 − x 2 − y 2 , show that the cdf FW and pdf f W of W =
 2 r
X 1 + X 22 are as follows:
1. FW (w) = ⎧
u(w − 1) and f W (w) = δ(w − 1) if r = 0.
⎨ 0, w ≤ 0,
2. FW (w) = w r , 0 ≤ w ≤ 1, and f W (w) = r1 w r −1 u(w)u(1 − w) if r >
1 1


1, w ≥ 1
0. 
0, w < 1,
and f W (w) = − r1 w r −1 u(w − 1) if r < 0.
1
3. FW (w) = 1
1−w , w ≥1
r

(3) Obtain FW and f W when r = 21 , 1, and −1 in (2).


Exercise 4.16 The marginal pdf of the three i.i.d. random variables X 1 , X 2 , and X 3
is f (x) = u(x)u(1 − x).
(1) Obtain the joint pdf f Y1 ,Y2 of (Y1 , Y2 ) = (X 1 + X 2 + X 3 , X 1 − X 3 ).
(2) Based on f Y1 ,Y2 , obtain the pdf f Y2 of Y2 .
(3) Based on f Y1 ,Y2 , obtain the pdf f Y1 of Y1 .
Exercise 4.17 Consider i.i.d. random variables X and Y with marginal pmf p(x) =
(1 − α)α x−1 ũ(x − 1), where 0 < α < 1.
(1) Obtain the pmf of X + Y and pmf of X − Y .
(2) Obtain the joint pmf of (X − Y, X ) and joint pmf of (X − Y, Y ).
(3) Using the results in (2), obtain the pmf of X , pmf of Y , and pmf of X − Y .
(4) Obtain the joint pmf of (X + Y, X − Y ), and using the result, obtain the pmf of
X − Y and pmf of X + Y . Compare the results with those obtained in (1).
Exercise 4.18 Consider Exercise 2.30. Let Rn be the number of type O cells at n + 21
minutes after the start of the culture. Obtain E {Rn }, the pmf p2 (k) of R2 , and the
probability η0 that nothing will remain in the culture.
Exercise 4.19 Obtain the conditional expected value E{X |Y = y} in Example 4.4.3.
Exercise 4.20 Consider an i.i.d. random vector X = (X 1 , X 2 , X 3 ) with marginal
pdf f (x) = e−x u(x). Obtain the joint pdf f Y (y1 , y2 , y3 ) of Y = (Y1 , Y2 , Y3 ), where
1 +X 2
Y1 = X 1 + X 2 + X 3 , Y2 = X 1X+X 2 +X 3
, and Y3 = X 1X+X
1
2
.
Exercise 4.21 Consider two i.i.d. random variables X 1 and X 2 with marginal pdf
f (x) = u(x)u(1 − x). Obtain the joint pdf of Y = (Y1 , Y2 ), pdf of Y1 , and pdf of Y2
when Y1 = X 1 + X 2 and Y2 = X 1 − X 2 .
Exercise 4.22 When Y = (Y1 , Y2 ) is obtained from rotating clockwise a point X =
(X 1 , X 2 ) in the two dimensional plane by θ , express the pdf of Y in terms of the pdf
f X of X.
Exercise 4.23 Assume that the value of the joint pdf f X,Y (x, y) of X and Y is positive
in a region containing x 2 + y 2 < a 2 , where a > 0. Express the conditional
 joint cdf

FX,Y |A and conditional joint pdf f X,Y |A in terms of f X,Y when A = X 2 + Y 2 ≤ a 2 .
328 4 Random Vectors

Exercise 4.24 The joint pdf of (X, Y ) is f X,Y (x, y) = 41 u (1 − |x|) u (1 − |y|).
 
When A = X 2 + Y 2 ≤ a 2 with 0 < a < 1, obtain the conditional joint cdf FX,Y |A
and conditional joint pdf f X,Y |A .

Exercise 4.25 Prove the following results:


(1) If X and Z are not orthogonal, then there exists a constant a for which Z and
X − a Z are orthogonal.
(2) It is possible that X and Y are uncorrelated even when X and Z are correlated
and Y and Z are correlated.

Exercise 4.26 Prove the following results:


(1) If X and Y are independent of each other, then they are uncorrelated.
(2) If the pdf f X of X is an even function, then X and X 2 are uncorrelated but are
not independent of each other.

Exercise 4.27 Show that


   
E max X 2 , Y 2 ≤ 1 + 1 − ρ 2 , (4.E.4)

where ρ is the correlation coefficient between the random variables X and Y both
with zero mean and unit variance.

⎞ a random vector X = (X 1 , X 2 , X 3 ) with covariance


T
Exercise 4.28 ⎛Consider
211
matrix K X = ⎝ 1 2 1 ⎠. Obtain a linear transformation making X into an uncor-
112
related random vector with unit variance.

 4.29 Obtain the pdf of Y when the joint pdf of (X, Y ) is f X,Y (x, y) =
Exercise
1
y
exp −y − xy u(x)u(y).

Exercise 4.30 When the joint pmf of (X, Y ) is


⎧1

⎪ , (x, y) = (1, 1),
⎨ 21
, (x, y) = (1, 2) or (2, 2),
p X,Y (x, y) = 8 (4.E.5)


1
, (x, y) = (2, 1),
⎩4
0, otherwise,

obtain the pmf of X and pmf of Y .

Exercise 4.31 For two i.i.d random variables X 1 and X 2 with marginal pmf p(x) =
e−λ λx! ũ(x), where λ > 0, obtain the pmf of M = max (X 1 , X 2 ) and pmf of N =
x

min (X 1 , X 2 ).

Exercise 4.32 For two i.i.d. random variables X and Y with marginal pdf f (z) =
u(z) − u(z − 1), obtain the pdf’s of W = 2X , U = −Y , and Z = W + U .
Exercises 329

Exercise 4.33
 For"three i.i.d. random variables X 1 , X 2 , and X 3 with
 marginal dis-
tribution U − 21 , 21 , obtain the pdf of Y = X 1 + X 2 + X 3 and E Y 4 .

Exercise 4.34 The random variables {X i }i=1 n


are independent of each other with
pdf’s { f i }i=1 , respectively. Obtain the joint pdf of {Yk }nk=1 , where Yk = X 1 + X 2 +
n

· · · + X k for k = 1, 2, . . . , n.

Exercise 4.35 The joint pmf of X and Y is


 x+y
, x = 1, 2, y = 1, 2, 3, 4,
p X,Y (x, y) = 32 (4.E.6)
0, otherwise.

(1) Obtain the pmf of X and pmf of Y .


(2) Obtain P(X > Y ), P(Y = 2X ), P(X + Y = 3), and P(X ≤ 3 − Y ).
(3) Discuss whether or not X and Y are independent of each other.

Exercise 4.36 For independent random variables X 1 and X 2 with pdf’s f X 1 (x) =
u(x)u(1 − x) and f X 2 (x) = e−x u(x), obtain the pdf of Y = X 1 + X 2 .

Exercise 4.37 Three Poisson random variables X 1 , X 2 , and X 3 with means 2, 1, and
4, respectively, are independent of each other.
(1) Obtain the mgf of Y = X 1 + X 2 + X 3 .
(2) Obtain the distribution of Y .

Exercise 4.38 When the joint pdf of X , Y , and Z is f X,Y,Z (x, y, z) = k(x + y +
z)u(x)u(y)u(z)u(1 − x)u(1 − y)u(1 − z), determine the constant k and obtain the
conditional pdf f Z |X,Y (z|x, y).

Exercise 4.39 Consider a random variable with probability measure


 λx e−λ
, x = 0, 1, 2, . . . ,
P(X = x) = x! (4.E.7)
0, otherwise.

Here,
 −λis a realization
 of a random variable  with pdf f  (v) = e−v u(v). Obtain
E e  X =1 .

Exercise 4.40 When U1 , U2 , and U3 are independent of each other, obtain the joint
pdf f X,Y,Z (x, y, z) of X = U1 , Y = U1 + U2 , and Z = U1 + U2 + U3 in terms of
the pdf’s of U1 , U2 , and U3 .

Exercise 4.41 Let (X, Y, Z ) be the rectangular coordinate of a randomly chosen


point in a sphere of radius 1 centered at the origin in the three dimensional space.
(1) Obtain the joint pdf f X,Y (x, y) and marginal pdf f X (x).
(2) Obtain the conditional joint pdf f X,Y |Z (x, y|z). Are X , Y , and Z independent of
each other?
330 4 Random Vectors

Exercise 4.42 Consider a random vector (X, Y ) with joint pdf f X,Y (x, y) =
c u (r − |x| − |y|), where c is a constant and r > 0.
(1) Express c in terms of r and obtain the pdf f X (x).
(2) Are X and Y independent of each other?
(3) Obtain the pdf of Z = |X | + |Y |.
Exercise 4.43 Assume X with cdf FX and Y with cdf FY are independent of each
other. Show that P(X ≥ Y ) ≥ 21 when FX (x) ≤ FY (x) at every point x.

Exercise 4.44 The joint pdf of (X, Y ) is f X,Y (x, y) = c x 2 + y 2 u(x)u(y)u (1
−x 2 − y 2 .
(1) Determine the constant c and obtain the pdf of X and pdf of Y . Are X and Y
independent of each other? √
(2) Obtain the joint pdf f R,Θ of R = X 2 + Y 2 and Θ = tan−1 YX .
(3) Obtain the pmf of the output Q = q(R, Θ) of polar quantizer, where
  1 π(k−1) πk
k, if 0 ≤ r ≤ 21 4 , ≤θ ≤ ,
q(r, θ ) =  1 8 8 (4.E.8)
π(k−1) πk
k + 4, if 21 4 ≤ r ≤ 1, 8
≤θ ≤ 8

for k = 1, 2, 3, 4.
Exercise 4.45 Two types of batteries have the pdf f (x) = 3λx 2 exp(−λx 3 )u(x) and
g(y) = 3μy 2 exp(−μy 3 )u(y), respectively, of lifetime with μ > 0 and λ > 0. When
the lifetimes of batteries are independent of each other, obtain the probability that
the battery with pdf f of lifetime lasts longer than that with g, and obtain the value
when λ = μ.
Exercise 4.46 Two i.i.d. random variables X and Y have marginal pdf f (x) =
e−x u(x).
(1) Obtain the pdf each of U = X + Y , V = X − Y , X Y , YX , Z = X
X +Y
, min(X, Y ),
min(X,Y )
max(X, Y ), and max(X,Y )
.
(2) Obtain the conditional pdf of V when U = u.
(3) Show that U and Z are independent of each other.
Exercise 4.47 Two Poisson random variables X 1 ∼ P (λ1 ) and X 2 ∼ P (λ2 ) are
independent of each other.
(1) Show that X 1 + X 2 ∼ P (λ1 + λ2 ).  
(2) Show that the conditional distribution of X 1 when X 1 + X 2 = n is b n, λ1λ+λ
1
2
.

Exercise 4.48 Consider Exercise 2.17.


(1) Obtain the mean and variance of the number M of matches.
(2) Assume that the students with matches will leave with their balls, and each of
the remaining students will pick a ball again after their balls are mixed. Show
that the mean of the number of repetitions until every student has a match is N .
Exercises 331

Exercise 4.49 A particle moves back and forth between positions 0, 1, . . . , n. At


any position, it moves to the previous or next position with probability 1 − p or p,
respectively, after 1 second. At positions 0 and n, however, it moves only to the next
position 1 and previous position n − 1, respectively. Obtain the expected value of
the time for the particle to move from position 0 to position n.
Exercise 4.50 Let N be the number of tosses of a coin with probability p of head
until we have two head’s in the last three tosses: we let N = 2 if the first two outcomes
are both head’s. Obtain the expected value of N .
Exercise 4.51 Two people A1 and A2 with probabilities p1 and p2 , respectively, of
hit alternatingly fire at a target until the target has been hit two times consecutively.
(1) Obtain the mean number μi of total shots fired at the target when Ai starts the
shooting for i = 1, 2.
(2) Obtain the mean number h i of times the target has been hit when Ai starts the
shooting for i = 1, 2.
Exercise 4.52 Consider Exercise 4.51, but now assume that the game ends when
the target is hit twice (i.e., consecutiveness is unnecessary). When A1 starts, obtain
the probability α1 that A1 fires the last shot of the game and the probability α2 that
A1 makes both hits.
Exercise 4.53 Assume i.i.d. random variables X 1 , X 2 , . . . with marginal distribution
U [0, 1). Let g(x) = E{N }, where N = min {n : X n < X n−1 } and X 0 = x. Obtain
an integral equation for g(x) conditional on X 1 , and solve the equation.
Exercise 4.54 We repeat tossing a coin with probability p of head. Let X be the
number of repetitions until head appears three times consecutively.
(1) Obtain a difference equation for g(k) = P(X
 = k).
(2) Obtain the generating function G X (s) = E s X .
(3) Obtain E{X }. (Hint. Use conditional expected value.)
Exercise 4.55 Obtain the conditional joint cdf FX,Y |A (x, y) and conditional joint
pdf f X,Y |A (x, y) when A = {x1 < X ≤ x2 }.
Exercise 4.56 For independent random variables X and Y , assume the pmf
1
, x = 3; 1
, x = 4;
p X (x) = 6 2 (4.E.9)
1
3
, x =5

of X and pmf
1
, y = 0,
pY (y) = 2 (4.E.10)
1
2
, y=1

of Y . Obtain the conditional pmf’s p X |Z , p Z |X , pY |Z , and p Z |Y and the joint pmf’s


p X,Y , pY,Z , and p X,Z when Z = X − Y .
332 4 Random Vectors

Exercise 4.57 Two exponential random variables T1 and T2 with rate λ1 and λ2 ,
respectively, are independent of each other. Let U = min (T1 , T2 ), V = max (T1 , T2 ),
and I be the smaller index, i.e., the index I such that TI = U .
(1) Obtain the expected values E{U }, E{V − U }, and E{V }.
(2) Obtain E{V } using V = T1 + T2 − U .
(3) Obtain the joint pdf fU,V −U,I of (U, V − U, I ).
(4) Are U and V − U independent of each other?

Exercise 4.58 Consider a bi-variate beta random vector (X, Y ) with joint pdf

Γ ( p1 + p2 + p3 ) p1 −1 p2 −1
f X,Y (x, y) = x y (1 − x − y) p3 −1
Γ ( p1 ) Γ ( p2 ) Γ ( p3 )
×u(x)u(y)u(1 − x − y), (4.E.11)

where p1 , p2 , and p3 are positive numbers. Obtain the pdf f X of X , pdf f Y of Y ,


conditional pdf f X |Y , and conditional pdf f Y |X . In addition, obtain the conditional
Y
pdf f 1−X
Y
| X of 1−X when X is given.

Exercise 4.59 Assuming the joint pdf f X,Y (x, y) = 16 1


u (2 − |x|) u (2 − |y|) of
(X, Y ), obtain the conditional joint cdf FX,Y |B and conditional joint pdf f X,Y |B when
B = {|X | + |Y | ≤ 1}.

Exercise 4.60 Let the joint pdf of X and Y be f X,Y (x, y) = |x y|u(1 − |x|)u(1 −
|y|). When A = X 2 + Y 2 ≤ a 2 with 0 < a < 1, obtain the conditional joint cdf
FX,Y |A and conditional joint pdf f X,Y |A .

Exercise 4.61 For a random vector X = (X 1 , X 2 , . . . , X n ), show that


 
E X R−1 X T = n, (4.E.12)

where R is the correlation matrix of X and R−1 is the inverse matrix of R.

Exercise 4.62 When the cf of (X, Y ) is ϕ X,Y (t, s), show that the cf of Z = a X + bY
is ϕ X,Y (at, bt).

Exercise 4.63 The joint pdf of (X, Y ) is

n!
f X,Y (x, y) = F i−1 (x){F(y) − F(x)}k−i−1
(i − 1)!(k − i − 1)!(n − k)!
×{1 − F(y)}n−k f (x) f (y)u(y − x), (4.E.13)

where i, k, and n are natural numbers such that 1 ≤ i < k ≤ n, F is the cdf of a
random variable, and f (t) = dtd F(t). Obtain the pdf of X and pdf of Y .
Exercises 333

Exercise 4.64 The number N of typographical errors in a book is a Poisson ran-


dom variable with mean λ. Proofreaders A and B find a typographical error with
probability p1 and p2 , respectively. Let X 1 , X 2 , X 3 , and X 4 be the numbers of typo-
graphical errors found by Proofreader A but not by Proofreader B, by Proofreader B
but not by Proofreader A, by both proofreaders, and by neither proofreader, respec-
tively. Assume that the event of a typographical error being found by a proofreader
is independent of that by another proofreader.
(1) Obtain the joint pmf of X 1 , X 2 , X 3 , and X 4 .
(2) Show that

E {X 1 } 1 − p2 E {X 2 } 1 − p1
= , = . (4.E.14)
E {X 3 } p2 E {X 3 } p1

Now assume that the values of p1 , p2 , and λ are not available.


(3) Using X i as the estimate of E {X i } for i = 1, 2, 3, obtain the estimates of p1 ,
p2 , and λ.
(4) Obtain an estimate of X 4 .

Exercise 4.65 Show that the correlation coefficient between X and |X | is


∞
−∞|x| (x − m X ) f (x)d x
ρ X |X | = % % , (4.E.15)
σ X2 σ X2 + 4m +X m −X

where m ±X , f , m X , and σ X2 are the half means defined in (3.E.28), pdf, mean, and
variance, respectively, of X . Obtain the value of ρ X |X | and compare it with what can
be obtained intuitively in each of the following cases of the pdf f X (x) of X :
(1) f X (x) is an even function.
(2) f X (x) > 0 only for x ≥ 0.
(3) f X (x) > 0 only for x ≤ 0.

Exercise 4.66 For a random variable X with pdf f X (x) = u(x) − u(x − 1), obtain
the joint pdf of X and Y = 2X + 1.

Exercise 4.67 Consider a random variable X and its magnitude Y = |X |. Show that
the conditional pdf f X |Y can be expressed as

f X (x)δ(x + y) f X (x)δ(x − y)
f X |Y (x|y) = u(−x) + u(x) (4.E.16)
f X (x) + f X (−x) f X (x) + f X (−x)

for y ∈ {y | { f X (y) + f X (−y)} u(y) > 0}, where f X is the pdf of X . Obtain the
conditional pdf f Y |X (y|x). (Hint. Use (4.5.15).)
334 4 Random Vectors

Exercise 4.68 Show that the joint cdf and joint pdf are
⎧ 

⎪ FX (x)u y−a −x
⎨ c
+FX y−a u x − y−a , c > 0,
FX,cX +a (x, y) = c c (4.E.17)

⎪ F (x)u(y − a), c = 0,
⎩ X   
FX (x) − FX y−a c
u x− y−a
c
, c<0

and
 
1
|c|
f X (x)δ y−a −x , c = 0,
f X,cX +a (x, y) = c (4.E.18)
f X (x)δ(y − a), c = 0,

respectively, for X and Y = cX + a.

Exercise 4.69 Let f and F be the pdf and cdf, respectively, of a continuous random
variable X , and let Y = X 2 .
(1) Obtain the joint cdf FX,Y . ∞
(2) Obtain the joint pdf f X,Y , and then confirm −∞ f X,Y (x, y)dy = f (x) and


1  √  √ 
f X,Y (x, y)d x = √ f y + f − y u(y) (4.E.19)
−∞ 2 y

by integration.
(3) Obtain the conditional pdf f X |Y .

Exercise
 ∞  ∞ 4.70 Show that the pdf f X,cX shown in (4.5.9) satisfies
−∞ −∞ f X,cX (x, y)d yd x = 1.

Exercise 4.71 Express the joint cdf and joint pdf of the input X and output Y =
X u(X ) of a half-wave rectifier in terms of the pdf f X and cdf FX of X .

Exercise 4.72 Obtain (4.5.2) from FX,X +a (x, y) = FX (min(x, y − a)) shown in
(4.5.1).

Exercise 4.73 Assume that the joint pdf of X = (X 1 , X 2 ) is

f X (x, y) = c x u(x)u(1 − x + y)u(1 − x − y). (4.E.20)

Determine c. Obtain and sketch the pdf’s of X 1 and X 2 .

Exercise 4.74 Consider a random vector X = (X 1 , X 2 ) with the pdf f X (x1 , x2 ) =


u (x1 ) u (1 − x1 ) u (x2 ) u (1 − x2 ).

(1) Obtain the joint pdf f Y of Y = (Y1 , Y2 ) = X 1 − X 2 , X 12 − X 22 .
(2) Obtain the pdf f Y1 of Y1 and pdf f Y2 of Y2 from f Y .
Exercises 335

(3) Compare the pdf f Y2 of Y2 with that we can obtain from


 -   - 
1 y y
f a X 2 (y) = √ fX + fX − u(y) (4.E.21)
2 ay a a

for a > 0 shown in (3.2.35) and



f X 1 −X 2 (y) = f X 1 ,X 2 (y + y2 , y2 ) dy2 (4.E.22)
−∞

shown in (4.2.20).

References

M. Abramowitz, I.A. Stegun (eds.), Handbook of Mathematical Functions (Dover, New York, 1972)
J. Bae, H. Kwon, S.R. Park, J. Lee, I. Song, Explicit correlation coefficients among random variables,
ranks, and magnitude ranks. IEEE Trans. Inform. Theory 52(5), 2233–2240 (2006)
N. Balakrishnan, Handbook of the Logistic Distribution (Marcel Dekker, New York, 1992)
D.L. Burdick, A note on symmetric random variables. Ann. Math. Stat. 43(6), 2039–2040 (1972)
W.B. Davenport Jr., Probability and Random Processes (McGraw-Hill, New York, 1970)
H.A. David, H.N. Nagaraja, Order Statistics, 3rd edn. (Wiley, New York, 2003)
A.P. Dawid, Some misleading arguments involving conditional independence. J. R. Stat. Soc. Ser.
B (Methodological) 41(2), 249–252 (1979)
W.A. Gardner, Introduction to Random Processes with Applications to Signals and Systems, 2nd
edn. (McGraw-Hill, New York, 1990)
S. Geisser, N. Mantel, Pairwise independence of jointly dependent variables. Ann. Math. Stat. 33(1),
290–291 (1962)
R.M. Gray, L.D. Davisson, An Introduction to Statistical Signal Processing (Cambridge University
Press, Cambridge, 2010)
R.A. Horn, C.R. Johnson, Matrix Analysis (Cambridge University Press, Cambridge, 1985)
N.L. Johnson, S. Kotz, Distributions in Statistics: Continuous Multivariate Distributions (Wiley,
New York, 1972)
S.A. Kassam, Signal Detection in Non-Gaussian Noise (Springer, New York, 1988)
S.M. Kendall, A. Stuart, Advanced Theory of Statistics, vol. II (Oxford University, New York, 1979)
A. Leon-Garcia, Probability, Statistics, and Random Processes for Electrical Engineering, 3rd edn.
(Prentice Hall, New York, 2008)
K.V. Mardia, Families of Bivariate Distributions (Charles Griffin and Company, London, 1970)
R.N. McDonough, A.D. Whalen, Detection of Signals in Noise, 2nd edn. (Academic, New York,
1995)
P.T. Nielsen, On the expected duration of a search for a fixed pattern in random data. IEEE Trans.
Inform. Theory 19(5), 702–704 (1973)
A. Papoulis, S.U. Pillai, Probability, Random Variables, and Stochastic Processes, 4th edn.
(McGraw-Hill, New York, 2002)
V.K. Rohatgi, A.KMd.E. Saleh, An Introduction to Probability and Statistics, 2nd edn. (Wiley, New
York, 2001)
J.P. Romano, A.F. Siegel, Counterexamples in Probability and Statistics (Chapman and Hall, New
York, 1986)
S.M. Ross, A First Course in Probability (Macmillan, New York, 1976)
336 4 Random Vectors

S.M. Ross, Stochastic Processes, 2nd edn. (Wiley, New York, 1996)
S.M. Ross, Introduction to Probability Models, 10th edn. (Academic, Boston, 2009)
G. Samorodnitsky, M.S. Taqqu, Non-Gaussian Random Processes: Stochastic Models with Infinite
Variance (Chapman and Hall, New York, 1994)
I. Song, J. Bae, S.Y. Kim, Advanced Theory of Signal Detection (Springer, Berlin, 2002)
J.M. Stoyanov, Counterexamples in Probability, 3rd edn. (Dover, New York, 2013)
A. Stuart and J. K. Ord, Advanced Theory of Statistics: Vol. 1. Distribution Theory, 5th edn. (Oxford
University, New York, 1987)
J.B. Thomas, Introduction to Probability (Springer, New York, 1986)
Y.H. Wang, Dependent random variables with independent subsets. Am. Math. Mon. 86(4), 290–292
(1979).
G.L. Wies, E.B. Hall, Counterexamples in Probability and Real Analysis (Oxford University, New
York, 1993)
Chapter 5
Normal Random Vectors

In this chapter, we consider normal random vectors in the real space. We first describe
the pdf and cf of normal random vectors, and then consider the special cases of bi-
variate and tri-variate normal random vectors. Some key properties of normal random
vectors are then discussed. The expected values of non-linear functions of normal
random vectors are also investigated, during which an explicit closed form for joint
moments is presented. Additional topics related to normal random vectors are then
briefly described.

5.1 Probability Functions

Let us first describe the pdf and cf of normal random vectors (Davenport 1970; Kotz
et al. 2000; Middleton 1960; Patel et al. 1976) in general. We then consider additional
topics in the special cases of bi-variate and tri-variate normal random vectors.

5.1.1 Probability Density Function and Characteristic


Function

Definition 5.1.1 (normal random vector) A vector X = (X 1 , X 2 , . . . , X n )T is called


an n dimensional, n-variable, or n-variate normal random vector if it has the joint
pdf
 
1 1 −1
f X (x) = √ exp − (x − m) T
K (x − m) , (5.1.1)
(2π)n |K | 2

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 337
I. Song et al., Probability and Random Variables: Theory and Applications,
https://doi.org/10.1007/978-3-030-97679-8_5
338 5 Normal Random Vectors

where m = (m 1 , m 2 , . . . , m n )T and K is an n × n Hermitian matrix with the deter-


minant |K | ≥ 0. The distribution is denoted by N (m, K ).

When m = (0, 0, . . . , 0)T and all the diagonal elements of K are 1, the normal
distribution is called a standard normal distribution. We will in most cases assume
|K | > 0: when |K | = 0, the distribution is called a degenerate distribution and will
be discussed briefly in Theorems 5.1.3, 5.1.4, and 5.1.5.
The distribution of a normal random vector is often called a jointly normal distri-
bution and a normal random vector is also called jointly normal random variables.
It should be noted that ‘jointly normal random variables’ and ‘normal random vari-
ables’ are strictly different. Specifically, the term ‘jointly normal random variables’
is a synonym for ‘a normal random vector’. However, the term ‘normal random vari-
ables’ denotes several random variables with marginal normal distributions which
may or may not be a normal random vector. In fact, all the components of a non-
Gaussian random vector may be normal random variables in some cases as we shall
see in Example 5.2.3 later, for instance.

Example 5.1.1 For a random vector X ∼ N (m, K ), the mean vector, covariance
matrix, and correlation matrix are m, K , and R = K + m m T , respectively. ♦

Theorem 5.1.1 The joint cf of X = (X 1 , X 2 , . . . , X n )T ∼ N (m, K ) is


 
1 T
ϕ X (ω) = exp jm ω − ω Kω ,
T
(5.1.2)
2

where ω = (ω1 , ω2 , . . . , ωn )T .

−2 1

 α= {(2π)
 |K |} and y = (y1 , y2 , . . . , yn )T = x − m, the cf
n
Proof Letting
ϕ X (ω) = E exp jω X of X can be calculated as
T

 
1  
ϕ X (ω) = α exp − (x − m)T K −1 (x − m) exp jω T x d x
2
x∈Rn
 
  1 
= α exp jω T m exp − y T K −1 y − 2 jω T y d y. (5.1.3)
2
y∈Rn

 T
Now, recollecting that ω T y = ω T y = y T ω because ω T y is scalar and
 T   
that K = K T , we have y − j K T ω K −1 y − j K T ω = y T K −1 − jω T K
  
K −1 y − j K T ω = y T K −1 y − j y T K −1 K T ω − jω T y − ω T K T ω, i.e.,

 T  
y − j KTω K −1 y − j K T ω = y T K −1 y − 2 jω T y − ω T K ω. (5.1.4)
5.1 Probability Functions 339

Thus, letting z = (z 1 , z 2 , . . . , z n )T = y − j K T ω, recollecting that ω T m =


 T T
ω m = m T ω because ω T m is scalar, and using (5.1.4), we get

 
ϕ X (ω) = α exp jω T m

1  T  
× exp − y − j K T ω K −1 y − j K T ω + ω T K ω d y
2
y∈Rn
   
  1 1
= α exp jω T m exp − ω T K ω exp − z T K −1 z d z
2 2
z∈Rn
 
1
= exp j m T ω − ω T K ω (5.1.5)
2

from (5.1.3). ♠

5.1.2 Bi-variate Normal Random Vectors

Let the covariance matrix of a bi-variate normal random vector X = (X 1 , X 2 ) be



σ12 ρσ1 σ2
K2 = , (5.1.6)
ρσ1 σ2 σ22

where ρ is the correlation coefficient


 between
 X 1 and X 2 with |ρ| ≤ 1. Then, we
have the determinant |K 2 | = σ12 σ22 1 − ρ2 of K 2 , the inverse

1 σ22 −ρσ1 σ2
K −1
2 =   (5.1.7)
σ1 σ2 1 − ρ
2 2 2 −ρσ1 σ2 σ12

of K 2 , and the joint pdf


 
1 1 (x − m 1 )2
f X 1 ,X 2 (x, y) =  exp −  
2πσ1 σ2 1−ρ2 2 1 − ρ2 σ12

(x − m 1 ) (y − m 2 ) (y − m 2 )2
−2ρ + (5.1.8)
σ1 σ2 σ22
 
of  (X 1 , X 2 ). The distribution
 N (m 1 , m 2 )T , K 2 is often also denoted by
N m 1 , m 2 , σ12 , σ22 , ρ . In (5.1.7) and the joint pdf (5.1.8), it is assumed that |ρ| < 1:
the pdf for ρ → ±1 will be discussed later in Theorem 5.1.3.
340 5 Normal Random Vectors

The contour or isohypse of the bi-variate normal pdf (5.1.8) is an ellipse:


specifically, referring to Exercise 5.42, the equation of the ellipse containing
100α% of the distribution can be expressed as (x−m 1)
− 2ρ (x−mσ11)(y−m 2)
+ (y−m 2)
2 2

σ 2 σ2 σ 2 =
  1 2
−2 1 − ρ ln(1 − α). As shown in Exercise 5.44, the major axis of the ellipse makes
2

the angle

1 2ρσ1 σ2
θ = tan−1 2 (5.1.9)
2 σ1 − σ22

with the positive x-axis. For ρ > 0, we have 0 < θ < π4 , θ = π4 , and π4 < θ < π2 when
σ1 > σ2 , σ1 = σ2 , and σ1 < σ2 , respectively, as shown in Figs. 5.1, 5.2, and 5.3.
Denoting the standard bi-variate normal pdf by f 2 , we have
 
y2
f 2 (0, y) = f 2 (0, 0) exp −   , (5.1.10)
2 1 − ρ2

  −1
where f 2 (0, 0) = 2π 1 − ρ2 .

Fig. 5.1 Contour of y


bi-variate normal pdf when
ρ > 0 and σ1 > σ2 , in which
case 0 < θ < π4

m2
θ

m1 x

Fig. 5.2 Contour of y


bi-variate normal pdf when
ρ > 0 and σ1 = σ2 , in which
case θ = π4

m2

m1 x
5.1 Probability Functions 341

Fig. 5.3 Contour of y


bi-variate normal pdf when
ρ > 0 and σ1 < σ2 , in which
case π4 < θ < π2

m2

m1 x

Example 5.1.2 By integrating  the joint pdf (5.1.8)over y and x, it is easy to see
that we have X ∼ N m 1 , σ12 and Y ∼ N m 2 , σ22 , respectively, when (X, Y ) ∼
N m 1 , m 2 , σ12 , σ22 , ρ . In other words, two jointly normal random variables are also
individually normal, which is a special case of Theorem 5.2.1. ♦
 
Example 5.1.3 For (X, Y ) ∼ N m 1 , m 2 , σ12 , σ22 , ρ , X and Y are independent if
ρ = 0. That is, two uncorrelated jointly normal random variables are independent,
which will be generalized in Theorem 5.2.3. ♦
 
Example 5.1.4 From Theorem 5.1.1, the joint cf is ϕ X,Y (u, v) = exp j m 1 u +
    
m 2 v − 21 σ12 u 2 + 2ρσ1 σ2 uv + σ22 v 2 for (X, Y ) ∼ N m 1 , m 2 , σ12 , σ22 , ρ . ♦

Example 5.1.5 Assume that X 1 ∼ N (0, 1) and X 2 ∼ N (0, 1) are independent.


∞  1 2 1 ∞
Then, the pdf f Y (y) = 2π 1
−∞ exp − 2 x − 2 (y − x) d x = 2π −∞ exp −
2 1

 2  ∞  
x − 2y − 41 y 2 d x = 2√1 π exp − 14 y 2 −∞ √12π exp − 21 v 2 dv of Y = X 1 + X 2
can eventually be obtained as
 2
1 y
f Y (y) = √ exp − (5.1.11)
2 π 4
√  
using (4.2.37) and letting v = 2 x − 2y . In other words, The sum of two indepen-
dent, standard normal
 random
 variables
 is an N (0, 2) random variable. In general,
when X 1 ∼ N m 1 , σ12 , X 2 ∼ N m 2 , σ22 , and X 1 and X 2 are independent of each
other, we have Y = X 1 + X 2 ∼ N m 1 + m 2 , σ12 + σ22 . A further generalization of
this result is expressed as Theorem 5.2.5 later. ♦

Example 5.1.6 Obtain the pdf of Z = X


Y
when X ∼ N (0, 1) and Y ∼ N (0, 1) are
independent of each other.

Solution Because X and Y are independent of each other,


 we get the pdf
 ∞  ∞ 
1 exp − 1 z 2 + 1 y 2 dy = 1 ∞ y exp
f Z (z) = −∞ |y| f Y (y) f X (zy)dy = −∞ |y| 2π 2 π 0
   
− 21 z 2 + 1 y 2 dy of Z = YX using (4.2.27). Next, letting 21 z 2 + 1 y 2 = t, we
342 5 Normal Random Vectors
∞
have the pdf of Z as f Z (z) = 1 1
π z 2 +1 0 e−t dt = 1 1
π z 2 +1
. In other words, Z is a
Cauchy random variable. ♦

Based on the results in Examples 4.2.10 and 5.1.6, we can show the following
theorem:
 
Theorem 5.1.2 When (X, Y ) ∼ N 0, 0, σ 2X , σY2 , ρ , we have the pdf

σ X σY 1 − ρ2
f Z (z) =  2 2  (5.1.12)
π σY z − 2ρσ X σY z + σ 2X

and cdf
1 1 σY z − ρσ X
FZ (z) = + tan−1  (5.1.13)
2 π σ X 1 − ρ2

of Z = X
Y
.
  −1  2 
Proof Let α = 2πσ X σY 1 − ρ2 and β = 2 1−ρ
1 z
2 − σ σ +
2ρz 1
. Using
( 2
) σX X Y σY2
∞ ∞
(4.2.27), we get f Z (z) = −∞ |v| f X,Y (zv, v)dv = α −∞ |v| exp − 2 1−ρ 1
 2 2    ( 2)
z v
 ∞
+ σv2 v2
2 2
z2
− 2ρzv dv = 2α 0 v exp − 2 1−ρ − σ2ρz + σ12 dv, i.e.,
σ 2X σ X σY Y ( 2 ) σ2X X σY Y
∞  
f Z (z) = 2α v exp −βv 2 dv. (5.1.14)
0

∞    ∞
2 
Thus, noting that 0 v exp −βv 2
dv = − 1

exp −βv  = 2β 1
, we get the pdf
√ 2   2
0  −1
σ σ  
of Z as f Z (z) = αβ = X Y π σY2 z − ρσ
1−ρ
σY
X
+ σ 2X 1 − ρ2 , which is the
same as (5.1.12).  
σY ρσ X
Next, if we let tan θz = √ z− σY
for convenience, the cdf
1−ρ2 σ X
z √
σ X σY 1−ρ2  z
of Z can be obtained as FZ (z) = −∞ f Z (t)dt = πσ2 1−ρ2 −∞ b(t)dt =
√ √ X( )
σ X σY 1−ρ2 σ X 1−ρ2  θz  π

− π2 dθ = π θz + 2 , leading to (5.1.13), where b(t) =
1
πσ X (1−ρ2 )
2 σY
  2 −1
σY2 ρσ X
1 + σ2 1−ρ t − . ♠
X( )
2 σY

Theorem 5.1.3 When ρ → ±1, we have the limit

exp − (y−m 2)
2
 
2σ22 x − m1 y − m2
lim f X 1 ,X 2 (x, y) = √ δ −ξ (5.1.15)
ρ→±1 2πσ1 σ2 σ1 σ2

of the bi-variate normal pdf (5.1.8), where ξ = sgn(ρ).


5.1 Probability Functions 343

Proof We can rewrite f X 1 ,X 2 (x, y) as


  
α 1 (y − m 2 )2
f X 1 ,X 2 (x, y) = √ exp −
π 2πσ1 σ2 2σ22
   
x − m1 y − m2 2
× exp −α −ρ (5.1.16)
σ1 σ2

by noting that (x−m 1 )2


σ12
− 2ρ (x−mσ11)(y−m
σ2
2)
+ (y−m 2 )2
σ22
= (x−m 1 )2
σ12
− 2ρ (x−m
σ1
1 ) (y−m 2 )
σ2
+
ρ2 (y−m 2) (y−m 2 )2
− ρ2 (y−m 2)
2 2

σ2
+ σ22 σ22
, i.e.,
2

(x − m 1 )2 (x − m 1 ) (y − m 2 ) (y − m 2 )2
− 2ρ +
σ12 σ1 σ2 σ22
 2  
x − m1 y − m2   y − m2 2
= −ρ + 1 − ρ2 , (5.1.17)
σ1 σ2 σ2

   −1   
where α = 2 1 − ρ2 . Now, noting that απ exp −αx 2 → δ(x) for α → ∞
as shown in Example 1.4.6 and that α → ∞ for ρ → ±1, we can obtain (5.1.15)
from (5.1.16). ♠

Based on f (x)δ(x − b) = f (b)δ(x − b) shown in (1.4.42) and the prop-


erty δ(ax) = |a| 1
δ(x) shown in (1.4.49) of the impulse function, the degener-
ate pdf (5.1.15) can be expressed in various equivalent formulas. For instance,
the term exp − (y−m 2)
in (5.1.15) can be replaced with exp − (x−m 1)
2 2

2σ22 2σ12
or
 
exp −ξ (x−m2σ1 )(y−m
1 σ2
2)
and the term σ11σ2 δ x−m
σ1
1
− ξ y−m
σ2
2
can be replaced with
δ (σ2 (x − m 1 ) − ξσ1 (y − m 2 )).

5.1.3 Tri-variate Normal Random Vectors

For a standard tri-variate normal random vector (X 1 , X 2 , X 3 ), let us denote the covari-
ance matrix as
⎡ ⎤
1 ρ12 ρ31
K 3 = ⎣ρ12 1 ρ23 ⎦ (5.1.18)
ρ31 ρ23 1


and the pdf as f 3 (x, y, z) = √ 1
exp − 21 (x y z)K −1
3 (x y z) , i.e.,
T
8π 3 |K 3 |
344 5 Normal Random Vectors

1 
1    
f 3 (x, y, z) =  exp − 1 − ρ223 x 2 + 1 − ρ231 y 2
8π 3 |K 3 | 2 |K 3 |
  
+ 1 − ρ212 z 2 + 2c12 x y + 2c23 yz + 2c31 zx , (5.1.19)

 − 1
where ci j = ρ jk ρki − ρi j . Then, we have f 3 (0, 0, 0) = 8π 3 |K 3 | 2 ,
 
|K 3 | = 1 − ρ212 + ρ223 + ρ231 + 2ρ12 ρ23 ρ31
  
= 1 − ρ2jk 1 − ρ2ki − ci2j
 
= αi2j,k 1 − βi2j,k , (5.1.20)

and
⎡ ⎤
1 − ρ223 c12 c31
1 ⎣
K −1 = c12 1 − ρ231 c23 ⎦ , (5.1.21)
3
|K 3 | c31 c23 1 − ρ212

where
  
αi j,k = 1 − ρ2jk 1 − ρ2ki (5.1.22)

−ci j
and βi j,k =   , i.e.,
1−ρ2jk (1−ρ2ki )

ρi j − ρ jk ρki
βi j,k = (5.1.23)
αi j,k

denotes the partial correlation coefficient between X i and X j when X k is given.


 
Example 5.1.7 Note that we have ρi j = ρ jk ρki and |K 3 | = αi2j,k = 1 − ρ2jk
   
1 − ρ2ki when βi j,k = 0. In addition, for ρi j → ±1, we have ρ jk → sgn ρi j ρki
  1−ρ2
because X i → sgn ρi j X j . Thus, when ρi j → ±1, we have βi j,k → 1−ρ2jk = 1 and
   2 jk

|K 3 | → − ρ jk − sgn ρi j ρki → 0. ♦
 1
Note also that f 3 (0, y, z) = f 3 (0, 0, 0) exp − 2 (0 y z)K −1
3 (0 y z)
T
= f 3 (0,
     
0, 0) exp − 2|K 3 | 1 − ρ31 y + 2c23 yz + 1 − ρ12 z , i.e.,
1 2 2 2 2

 
1 y2 2β23,1 yz
f 3 (0, y, z) = f 3 (0, 0, 0) exp −   −
2 1 − β23,1
2 1 − ρ212 α23,1

z2
+ (5.1.24)
1 − ρ231
5.1 Probability Functions 345

and
   
1 − ρ212 2
f 3 (0, 0, z) = f 3 (0, 0, 0) exp − z . (5.1.25)
2 |K 3 |

∞
Example 5.1.8 Based on (5.1.24) and (5.1.25), we have h 1 (z) f 3 (0, 0, z)dz =
 −∞
 2
∞ z2
∞   w
√ 1 −∞ h 1 (z) exp − 2 2 dz = √ 23,1
−∞ h1 23,1 w exp − 2 dw , i.e.,
8π 3 |K 3 | 23,1 8π 3 |K 3|

⎛ ⎞
∞ ∞

1 |K 3 |
h 1 (z) f 3 (0, 0, z)dz =    h1 ⎝  w⎠
−∞ 8π 1 − ρ12
3 2 −∞ 1 − ρ12 2

 2
w
× exp − dw
2
⎧ ⎛ ⎞⎫
⎨ √ ⎬
1 |K 3 |
=  E h1 ⎝  U⎠ (5.1.26)
⎩ ⎭
2π 1 − ρ212 1 − ρ212

  
for a uni-variate function h 1 , where 2
i j,k = 1 − βi2j,k 1 − ρ2jk and U ∼ N (0, 1).
We also have
∞ ∞ ∞ ∞
1
h 2 (y, z) f 3 (0, y, z)dydz =  h 2 (y, z)
−∞ −∞ 8π 3 |K 3 | −∞ −∞
  
1 y2 2β23,1 yz z2
× exp −   − + dydz
2 1 − β23,1
2 1 − ρ212 α23,1 1 − ρ231
∞ ∞   
α23,1
=  h2 1 − ρ12 v, 1 − ρ31 w
2 2
8π 3 |K 3 | −∞ −∞
 
1  2 
× exp −   v − 2β23,1 vw + w 2
dvdw
2 1 − β23,1
2

2πα23,1 1 − β23,1 2 ∞ ∞   
=  h2 1 − ρ12 v, 1 − ρ31 w
2 2
8π 3 |K 3 | −∞ −∞
× f 2 (v, w)|ρ=β23,1 dvdw
   
1
= √ E h2 1 − ρ12 V1 , 1 − ρ31 V2
2 2
(5.1.27)

346 5 Normal Random Vectors
 
for a bi-variate function h 2 , where (V1 , V2 ) ∼ N 0, 0, 1, 1, β23,1 . The two results
(5.1.26) and (5.1.27) are useful in obtaining the expected values of some non-linear
functions. ♦

Denote by gρ (x, y, z) the standard tri-variate normal pdf f 3 (x, y, z) with the
covariance matrix K 3 shown in (5.1.18) so that the correlation coefficients ρ =
(ρ12 , ρ23 , ρ31 ) are shown explicitly. Then, we have

gρ (−x, y, z) = gρ (x, y, z) 1 , (5.1.28)

gρ (x, −y, z) = gρ (x, y, z) 2 , (5.1.29)

gρ (x, y, −z) = gρ (x, y, z) 3 , (5.1.30)

and
0 ∞ ∞
h(x, y, z)gρ (x, y, z)d xd ydz
−∞ 0 0
0 ∞ ∞
= h(x, y, −t)gρ (x, y, −t)d xd y(−dt)
∞ 0 0

∞ ∞ ∞ 
= h(x, y, −z)gρ (x, y, z)d xd ydz  (5.1.31)
0 0 0 3

for a tri-variate function h(x, y, z). Here, k denotes the replacements of the corre-
lation coefficients ρ jk and ρki with −ρ jk and −ρki , respectively.

Example 5.1.9 We have βi j,k = β ji,k ,



βi j,k  i = −βi j,k , (5.1.32)

βi j,k  k = βi j,k , (5.1.33)
∂ 1
sin−1 βi j,k = √ , (5.1.34)
∂ρi j |K 3 |

and
∂ ρki − ρi j ρ jk
sin−1 βi j,k = −  √ (5.1.35)
∂ρ jk 1 − ρ2jk |K 3 |

for the standard tri-variate normal distribution. ♦

After steps similar to those used in obtaining (5.1.15), we can obtain the following
theorem:
5.1 Probability Functions 347
 
Theorem 5.1.4 Letting ξi j = sgn ρi j , we have
   )
exp − 21 x 2 {z − μ1 (x, y)}2
f 3 (x, y, z) →  exp −   δ (x − ξ12 y) (5.1.36)
2π 1 − ρ231 2 1 − ρ231

when ρ12 → ±1, where μ1 (x, y) = 21 ξ12 (ρ23 x + ρ31 y). We subsequently have
 
exp − 21 x 2
f 3 (x, y, z) → √ δ (x − ξ12 y) δ (x − ξ31 z) (5.1.37)

when ρ12 → ±1 and ρ31 → ±1.

Proof The proof is discussed in Exercise 5.41. ♠


   
In (5.1.36), we can replace ρ231 with ρ223 and exp − 21 x 2 with exp − 21 y 2 . The
‘mean’ μ1 (x, y) of X 3 , when (X 1 , X 2 , X 3 ) has the pdf (5.1.36), can be written also
as μ1 (x, y) = 21 ξ12 (ξ12 ρ31 x + ρ31 y) = 21 ξ12 ρ31 (ξ12 x + y) because, due to the con-
dition |K 3 | ≥ 0, the result lim |K 3 | = − (ρ23 ∓ ρ31 )2 requires that ρ23 → ρ12 ρ31 ,
ρ12 →±1
i.e., ρ23 = ξ12 ρ31 when ρ12 → ±1. In addition, because of the function δ (x − ξ12 y),
the mean can further be rewritten as μ1 (x, y) = 21 ξ12 ρ31 (ξ12 x + ξ12 x) = ρ31 x or as
μ1 (x, y) = ρ23 y. Similarly,
 (5.1.37)
 can be expressed in various
 equivalent
 formu-

las: for instance, exp − 21 x 2 can be replaced with exp − 21 y 2 or exp − 21 z 2 and
δ (x − ξ31 z) can be replaced with δ (z − ξ31 x) or δ (y − ξ23 z).
The result (5.1.37) in Theorem 5.1.4 can be generalized as follows:

Theorem 5.1.5 For the pdf f X (x) shown in (5.1.1), we have

−m 1 )
exp − (x1 2σ
2
*n  
2 1 x1 − m 1 xi − m i
f X (x) →  1
δ − ξ1i (5.1.38)
2πσ12 σ
i=2 i
σ1 σi

 
when ρ1 j → ±1 for j = 2, 3, . . . , n, where ξ1 j = sgn ρ1 j .

Note in Theorem 5.1.5 that, when ρ1 j → ±1 for j ∈ {2, 3, . . . , n}, the value of ρi j
for i ∈ {2, 3, . . . , n} and j ∈ {2, 3, . . . , n} is determined as ρi j → ρ1i ρ1 j = ξ1i ξ1 j .
In the tri-variate case, for instance, when ρ12 → 1 and ρ31 → 1, we have ρ23 → 1
from lim |K 3 | = − (1 − ρ23 )2 ≥ 0.
ρ12 ,ρ13 →1
348 5 Normal Random Vectors

5.2 Properties

In this section, we discuss the properties (Hamedani 1984; Horn and Johnson 1985;
Melnick and Tenenbein 1982; Mihram 1969; Pierce and Dykstra 1969) of normal
random vectors. Some of the properties we will discuss in this chapter are based on
those described in Chap. 4. We will also present properties unique to normal random
vectors.

5.2.1 Distributions of Subvectors and Conditional


Distributions

For X = (X 1 , X 2 , . . . , X n )T ∼ N (m, K ), let us partition the covariance matrix K


and its inverse matrix K −1 between the s-th and (s + 1)-st rows, and between the
s-th and (s + 1)-st columns also, as
 
K 11 K 12
K = (5.2.1)
K 21 K 22

and
 
−1 Ψ 11 Ψ 12
K = . (5.2.2)
Ψ 21 Ψ 22

Then, we have K ii = K iiT and Ψ ii = Ψ iiT for i = 1, 2, K 21 = K 12


T
, and Ψ 21 = Ψ 12
T
.
We also have

Ψ 11 = K −1 −1
11 + K 11 K 12 ξ
−1
K 21 K −1
11 , (5.2.3)
Ψ 12 = −K −1 −1
11 K 12 ξ , (5.2.4)
−1 −1
Ψ 21 = −ξ K 21 K 11 , (5.2.5)

and

Ψ 22 = ξ −1 , (5.2.6)

where ξ = K 22 − K 21 K −1
11 K 12 .

Theorem 5.2.1 Assume X = (X 1 , X 2 , . . . , X n )T ∼ N (m, K ) and the partition of


the covariance matrix K as described in (5.2.1). Then, for a subvector X (2) =
(X s+1 , X s+2 , . . . , X n )T of X, we have (Johnson and Kotz 1972)
5.2 Properties 349
 
X (2) ∼ N m(2) , K 22 , (5.2.7)

where m(2) = (m s+1 , m s+2 , . . . , m n )T . In other words, any subvector of a normal


random vector is a normal random vector.
Theorem 5.2.1 also implies that every element of a normal random vector is a
normal random variable, which we have already observed in Example 5.1.2. However,
it should again be noted that the converse of Theorem 5.2.1 does not hold true as we
can see in the example below.
Example 5.2.1 (Romano and Siegel 1986) Assume the joint pdf

2g(x, y), x y ≥ 0,
f X,Y (x, y) = (5.2.8)
0, xy < 0

  
of (X, Y ), where g(x, y) = 1

exp − 21 x 2 + y 2 . Then, we have the marginal pdf
⎧    2
⎨ 1 exp − x 2 0
−∞ exp − 2 dy, x < 0,
y
π  2   
f X (x) = 1 ∞
⎩ exp − x 2 exp − y2
dy, x ≥ 0
π 2 0 2
 2
1 x
= √ exp − (5.2.9)
2π 2

of X . We can similarly show that Y is also a normal random variable. In other words,
although X and Y are both normal random variables, (X, Y ) is not a normal random
vector. ♦

Theorem 5.2.2 Assume X = (X 1 , X 2 , . . . , X n )T ∼ N (m, K ) and the partition of


the inverse K −1 of the covariance matrix K as described in (5.2.2). Then, we have
the conditional distribution (Johnson and Kotz 1972)
  T 
N m(1)
T
− x (2) − m(2) Ψ 21 Ψ −1 −1
11 , Ψ 11 (5.2.10)

of X (1) = (X 1 , X 2 , . . . , X s )T when X (2) = (X s+1 , X s+2 , . . . , X n )T = x (2)


= (xs+1 , xs+2 , . . . , xn ) T
is given, where m(1) = (m 1 , m 2 , . . . , m s )T and
m(2) = (m s+1 , m s+2 , . . . , m n )T .
 
Example
 5.2.2
 From the joint pdf (5.1.8) of N m 1 , m 2 , σ12 , σ22 , ρ and the pdf of
N m 2 , σ22 , the conditional
 pdf f X |Y (x|y) for a normal random vector (X, Y ) ∼
N m 1 , m 2 , σ12 , σ22 , ρ can be obtained as
  2 
1 x − m X |Y =y
f X |Y (x|y) =    exp − 2σ 2 1 − ρ2  , (5.2.11)
2πσ12 1 − ρ2 1
350 5 Normal Random Vectors

where m X |Y =y = m 1 + ρ σσ21 (y − m 2 ). In short, the distribution of X given Y = y


    
is N m X |Y =y , σ12 1 − ρ2 for (X, Y ) ∼ N m 1 , m 2 , σ12 , σ22 , ρ . This result can
  T
be obtained also from (5.2.10) using m(1) T
= m 1 , x (2) − m(2) = y − m 2 , Ψ 21 =
 
− σ σ ρ1−ρ2 , and Ψ −1
11 = σ1 1 − ρ .
2 2

1 2( )
Theorem 5.2.3 If a normal random vector is an uncorrelated random vector, then
it is an independent random vector.
In general, two uncorrelated random variables are not necessarily independent
as we have discussed in Chap. 4. Theorem 5.2.3 tells us that two jointly normal
random variables are independent of each other if they are uncorrelated, which can
promptly be confirmed because we have f X,Y (x, y) = f X (x) f Y (y) when ρ = 0 in
the two-dimensional normal pdf (5.1.8). In Theorem 5.2.3, the key point is that the
two random variables are jointly normal (Wies and Hall 1993): in other words, if
two normal random variables are not jointly normal but are only marginally normal,
they may or may not be independent when they are uncorrelated.

Example 5.2.3 (Stoyanov 2013) Let φ1 (x, y) and φ2 (x, y) be two standard bi-
variate normal pdf’s with correlation coefficients ρ1 and ρ2 , respectively. Assume
that the random vector (X, Y ) has the joint pdf

f X,Y (x, y) = c1 φ1 (x, y) + c2 φ2 (x, y), (5.2.12)

where c1 > 0, c2 > 0, and c1 + c2 = 1. Then, when ρ1 = ρ2 , f X,Y is not a normal pdf
and, therefore, (X, Y ) is not a normal random vector. Now, we have X ∼ N (0, 1),
Y ∼ N (0, 1), and the correlation coefficient between X and Y is ρ X Y = c1 ρ1 + c2 ρ2 .
If we choose c1 = ρ2ρ−ρ
2
1
and c2 = ρ1ρ−ρ
1
2
for ρ1 ρ2 < 0, then c1 > 0, c2 > 0, c1 + c2 =
1, and ρ X Y = 0. In short, although X and Y are both normal and uncorrelated with
each other, they are not independent of each other because (X, Y ) is not a normal
random vector. ♦

Based on Theorem 5.2.3, we can show the following theorem:

Theorem 5.2.4 For a normal random vector X = (X 1 , X 2 , . . . , X n ), con-


sider non-overlapping k subvectors X 1 = X i1 , X i2 , . . . , X in1 , X 2 = X j1 , X j2 ,
   +
k
. . . , X jn2 , . . ., X k = X l1 , X l2 , . . . , X lnk , where n j = n. If ρi j = 0 for
j=1
every choice of i ∈ Sa and j ∈ Sb , with a ∈ {1, 2, . . . , k} and b ∈ {1,
 . . . , k}, then X 1 , X
2,  2 , . . ., X k are independent of each other, where S1 =
i 1 , i 2 , . . . , i n 1 , S2 = j1 , j2 , . . . , jn 2 , . . ., and Sk = l1 , l2 , . . . , ln k .

Example 5.2.4 For a normal random vector X = (X 1 , X 2 , . . . , X 5 ) with the covari-


ance matrix
5.2 Properties 351
⎛ ⎞
1 0 ρ13 ρ14 0
⎜ 0 1 0 0 ρ25 ⎟
⎜ ⎟
K = ⎜
⎜ ρ31 0 1 ρ34 0 ⎟⎟, (5.2.13)
⎝ ρ41 0 ρ43 1 0 ⎠
0 ρ52 0 0 1

the subvectors X 1 = (X 1 , X 3 , X 4 ) and X 2 = (X 2 , X 5 ) are independent of each other.


5.2.2 Linear Transformations

Let us first consider a generalization of the result obtained in Example 5.1.5 that the
sum of two jointly normal random variables is a normal random variable.
   n
Theorem 5.2.5 When the random variables X i ∼ N m i , σi2 i=1 are independent
of each other, we have
/ 0
.
n .
n .
n
Xi ∼ N mi , σi2 . (5.2.14)
i=1 i=1 i=1

 
Proof Because the cf of X i is ϕ X i (ω) = exp jm i ω − 21 σi2 ω 2 , we can obtain the cf
 
1
n   n 
+  +n
ϕY (ω) = exp jm i ω − 21 σi2 ω 2 = exp jm i ω − 21 σi2 ω 2 of Y = X i as
i=1 i=1 i=1

 / n 0 / n 0 
. 1 . 2
ϕY (ω) = exp j mi ω − σi ω 2 (5.2.15)
i=1
2 i=1

using (4.3.32). This result implies (5.2.14). ♠

Generalizing Theorem 5.2.5 further, we have the following theorem that a linear
transformation of a normal random vector is also a normal random vector:

Theorem
 5.2.6 When X = (X 1 , X 2 , . . . , X n )T ∼ N (m, K ), we have L X ∼
N Lm, L K L T when L is an n × n matrix such that |L| = 0.

Proof First, we have X = L −1 Y because  |L| = 0, and the Jacobian of the inverse
 
transformation x = g ( y) = L y is  ∂∂y g −1 ( y) =  L −1  = |L|
−1 −1 1
. Thus, we have


the pdf f Y ( y) = |L| f X (x) −1 of Y as
1
x=L y
352 5 Normal Random Vectors

 T  
exp − 21 L −1 y − m K −1 L −1 y − m
f Y ( y) = √ (5.2.16)
|L| (2π)n |K |

 −1  −1 T  T 
from Theorem 4.2.1. Now, note that L T = L ,  L  = |L|, and
 −1 T −1  −1   −1 T −1 −1
L y−m K L y − m = ( y − Lm) L T
K L ( y − Lm). In
−1
 T −1 −1 −1  −1 T
addition, letting H = L KL , we have H = L
T
K L = L
K −1 L −1 and |H| =  L K L T  = |L|2 |K |. Then, we can rewrite (5.2.16) as
 
1 1 −1
f Y ( y) = √ exp − ( y − Lm) H ( y − Lm) ,
T
(5.2.17)
(2π)n |H| 2
 
which implies L X ∼ N Lm, L K L T when X ∼ N (m, K ). ♠

Theorem 5.2.6 is a combined generalization of the facts that the sum of two jointly
normal random variables is a normal random variable, as described in Example 5.1.5,
and that the sum of a number of independent normal random variables is a normal
random variable, as shown in Theorem 5.2.5.

Example 5.2.5 For (X, Y ) ∼ N (10, 0, 4, 1, 0.5), find the numbers a and b so that
Z = a X + bY and W = X + Y are uncorrelated.

Solution Clearly, E{Z W } − E{Z }E{W } = E a X 2 + bY 2 + (a + b)X Y − 100a
= 5a + 2b because E{Z } = 10a and E{W } = 10. Thus, for any pair of two real
numbers a and b such that 5a + 2b = 0, the two random variables Z = a X + bY
and W = X + Y will be uncorrelated. ♦

Example 5.2.6 For a random vector (X, Y ) ∼ N (10, 0, 4, 1, 0.5), obtain the joint
distribution of Z = X + Y and W = X − Y .

Solution We first note that Z and W are jointly normal from Theorem 5.2.6. We
thus only need to obtain E{Z }, E{W }, Var{Z }, Var{W }, and ρ Z W . We first have
E{Z } = 10 and E{W
 } = 10 from E{X 2± Y } = E{X } ± E{Y 3 }. Next, we have the
variance σ 2Z = E (X + Y − 10)2 = E {(X − 10) + Y }2 of Z as

σ 2Z = σ 2X + 2E{X Y − 10Y } + σY2


=7 (5.2.18)

from E{X Y } − E{X }E{Y } = ρσ X σY = 1 and, similarly, the variance σW2


= 3 of
W . In addition, we also get E{Z W } = E X − Y = m X + σ X − σY2 = 103
2 2 2 2

and, consequently, the correlation coefficient ρ Z W = 103−100
√ √ = 3
between Z
  7 3 7

and W . Thus, we have (Z , W ) ∼ N 10, 10, 7, 3, 37 . In passing, the joint


5.2 Properties 353
 
(x−10)2
pdf of (Z , W ) is f Z ,W (x, y) = √ √ √ exp − −2
1 1 3 x−10

2π 7 3 1− 37 2(1− 37 ) 7 7 7

√ + (y−10)
2
y−10
3 3
, i.e..

√  
3 1  2 
f Z ,W (x, y) = exp − 3x − 6x y + 7y − 80y + 400 . (5.2.19)
2
12π 24

The distribution of (Z , W ) can also be obtained


  from
 Theorem
  5.2.6 more
Z 1 1 X X
directly as follows: because V = W = 1 −1 = L Y , we have the
      Y 
X 1 1 10 10
mean vector E {V } = LE Y = = and the covariance
    1 −1  0   10
1 1 4 1 1 1 7 3
matrix K V = L K L T = 1 −1 1 −1
= 3 3 of V . Thus, (Z , W ) ∼
    
1 1
 
10 7 3
N 10
, 33 = N 10, 10, 7, 3, 37 . ♦

+
n
From Theorem 5.2.6, the linear combination ai X i of the components of a
i=1
normal random vector X = (X 1 , X 2 , . . . , X n ) is a normal random variable. Let us
again emphasize that, while Theorem 5.2.6 tells us that a linear transformation of
jointly normal random variables produces jointly normal random variables, a linear
transformation of random variables which are normal only marginally but not jointly
is not guaranteed to produce normal random variables (Wies and Hall 1993).
As we can see in Examples 5.2.7–5.2.9 below, when {X i }i=1 n
are all normal random
variables but X = (X 1 , X 2 , . . . , X n ) is not a normal random vector, (A) the normal
random variables {X i }i=1
n
are generally not independent even if they are uncorrelated,
(B) the linear combination of {X i }i=1 n
may or may not be a normal random variable,
and (C) the linear transformation of X is not a normal random vector.

Example 5.2.7 (Romano and Siegel 1986) Let X ∼ N (0, 1) and H be the outcome
from a toss of a fair coin. Then, we have Y ∼ N (0, 1) for the random variable

X, H = head,
Y = (5.2.20)
−X, H = tail.

Now,
2  2because  3 } = 0, E{Y } = 0, and E{X Y } = E{E{X Y |H }} =
E{X
1
2
E X + E −X 2 = 0, the random variables X and Y are uncorrelated. How-
ever, X and Y are not independent because, for instance, P(|X | > 1)P(|Y | < 1) > 0
while P(|X | > 1, |Y | < 1) = 0. In addition, X + Y is not normal. In other words,
even when X and Y are both normal random variables, X + Y could be non-normal
if (X, Y ) is not a normal random vector. ♦
354 5 Normal Random Vectors

Example 5.2.8 (Romano and Siegel 1986) Let X ∼ N (0, 1) and



X, |X | ≤ α,
Y = (5.2.21)
−X, |X | > α

for a positive number α. Then, X and Y are not independent. In addition, Y is also
a standard normal random variable because, for any set B such that B ∈ B(R), we
have

P(Y ∈ B) = P(Y ∈ B| |X | ≤ α)P(|X | ≤ α)


+P(Y ∈ B| |X | > α)P(|X | > α)
= P(X ∈ B| |X | ≤ α)P(|X | ≤ α) + P(−X ∈ B| |X | > α)P(|X | > α)
= P(X ∈ B| |X | ≤ α)P(|X | ≤ α) + P(X ∈ B| |X | > α)P(|X | > α)
= P(X ∈ B). (5.2.22)
α ∞
Now, the correlation coefficient ρ X Y = E{X Y } = 2 0 x 2 φ(x)d x − 2 α x2
φ(x)d x between X and Y can be obtained as
α
ρX Y = 4 x 2 φ(x) d x − 1, (5.2.23)
0


where φ denotes the standard normal pdf. Letting g(α) = 0 x 2 φ(x)d x, we can find
a positive number α0 such that1 g(α0 ) = 41 because g(0) = 0, g(∞) = 21 , and g
is a continuous function. Therefore, when α = α0 , X and Y are uncorrelated from
(5.2.23). Meanwhile, because

2X, |X | ≤ α,
X +Y = (5.2.24)
0, |X | > α,

X + Y is not normal. ♦

Example 5.2.9 (Stoyanov 2013) When X = (X, Y ) is a normal random vector, the
random variables X , Y , and X + Y are all normal. Yet, the converse is not necessarily
true. We now consider an example. Let the joint pdf of X = (X, Y ) be
 
1 1 2 
f X (x, y) = exp − x + y 2
2π 2
 
 2  1 2 
× 1 + x y x − y exp − x + y + 2
2 2
, (5.2.25)
2

where > 0. Let us also note that

1 Here, α0 ≈ 1.54.
5.2 Properties 355
  
  2    
x y x − y 2 exp − 1 x 2 + y 2 + 2  ≤ 1 (5.2.26)
 2 

   
when ≥ −2 + ln 4 ≈ −0.6137 because −4e−2 ≤ x y x 2 − y 2 exp − 21 x 2 +

y 2 ≤ 4e−2 . Then, the joint cf of X can be obtained as
 2     
s + t2 st s 2 − t 2 s2 + t 2
ϕ X (s, t) = exp − + exp − − , (5.2.27)
2 32 4

from which we can make the following observations:


(A)  We have X ∼ N (0, 1) and Y ∼ N (0, 1) because ϕ X (t) = ϕ X (t, 0) =
exp − 21 t 2 and ϕY (t) = ϕ X (0, t) = exp − 21 t 2 .
 
(B) We have2 X + Y ∼ N (0, 2) because ϕ X +Y (t) = ϕ X (t, t) = exp −t 2 .
(C) We have X − Y ∼ N (0, 2) because  ϕ X −Y (t) = ϕ X (t, −t) = exp −t 2 .
∂2 
(D) We have E{X Y } = ∂t∂s ϕ X (s, t) = 0 and E{X } = E{Y } = 0. There-
(s,t)=(0,0)
fore, X and Y are uncorrelated.
(E) As it is clear from (5.2.25) or (5.2.27), the random vector (X, Y ) is not a normal
random vector. In other words, although X , Y , and X + Y are all normal random
variables, (X, Y ) is not a normal random vector. ♦
Theorem 5.2.7 A normal random vector with a positive definite covariance matrix
can be linearly transformed into an independent standard normal random vector.

Proof Theorem 5.2.7 can be proved from Theorems 4.3.5, 5.2.3, and 5.2.6, or from
(4.3.24) and (4.3.25). Specifically, when X ∼ N (m, K ) with |K | > 0, the eigen-
values of K are {λi }i=1
n
, and the eigenvectorcorresponding to λi is ai , assume the
matrix A and λ̃ = diag √1λ , √1λ , . . . , √1λ considered in (4.3.21) and (4.3.23),
1 2 n
respectively. Then, the mean vector of

Y = λ̃ A (X − m) (5.2.28)

is E {Y } = λ̃ AE {X − m} = 0. In addition, as we can see from (4.3.25), the covari-


ance matrix of Y is K Y = I. Therefore, using Theorem 5.2.6, we get Y ∼ N (0, I).
In other words, Y = λ̃ A (X − m) is a vector of independent standard normal random
variables. ♠
Example 5.2.10 Transform the random vector U = (U1 , U2 )T ∼ N (m, K ) into
an
 independent
 standard normal random vector when m = (10 0)T and K =
2 −1
.
−1 2

2Here, as we have observed in Exercise 4.62, when the joint cf of X = (X, Y ) is ϕ X (t, s), the cf
of Z = a X + bY is ϕ Z (t) = ϕ X (at, bt).
356 5 Normal Random Vectors

Solution The eigenvalues and corresponding eigenvectors of the covariance matrix


K of U are λ1 = 3, a1 = √12 (1 − 1)T and λ2 = 1, a2 = √12 (1 1)T . Thus, for
 1  
√ 0 1 −1
the linear transformation L = 2
√1 3 , i.e.,
0 1 1 1
/ 0
√1 − √16
L= 6 , (5.2.29)
√1 √1
2 2

 
the random vector V = L U − (10 0)T will be a vector of independent stan-
dard normal
/ random variables:
0 / 1 the 0covariance matrix of V is K V = L K U L H =
√ −√
1 1  √ √1  
2 −1 10
6 6 6 2 = . ♦
√1
2
√1
2
−1 2 − 1
√ √1
6 2
0 1

In passing, the following theorem is noted without a proof:

Theorem 5.2.8 If the linear combination a T X is a normal random variable for


every vector a = (a1 , a2 , . . . , an )T , then the random vector X is a normal random
vector and the converse is also true.

5.3 Expected Values of Nonlinear Functions

In this section, expected values of some non-linear functions and joint moments (Bär
and Dittrich 1971; Baum 1957; Brown 1957; Hajek 1969; Haldane 1942; Holmquist
1988; Kan 2008; Nabeya 1952; Song and Lee 2015; Song et al. 2020; Triantafyl-
lopoulos 2003; Withers 1985) of normal random vectors are investigated. We first
consider a few simple examples based on the cf and mgf.

5.3.1 Examples of Joint Moments

Let us start with some examples for obtaining joint moments of normal random
vectors via cf and mgf.

Example 5.3.1 For the joint central moment μi j = E (X − m X )i (Y − m Y ) j of a
random vector (X, Y ), we have observed that μ00 = 1, μ01 = μ10 = 0, μ20 = σ12 ,
μ02 = σ22 , and μ11 = ρσ1 σ2 in Sect. 4.3.2.1. In addition,
 it is easy
 to see that
μ30 = μ03 = 0 and μ40 = 3σ14 when (X, Y ) ∼ N 0, 0, σ12 , σ22 , ρ  from (3.3.31).
 2 2
 that μ231 = 1 σ2 , μ22 = 1 + 2ρ σ1 σ2 ,
3 2
Now, based on the moment theorem, show 3ρσ

and μ41 = μ32 = 0 when (X, Y ) ∼ N 0, 0, σ1 , σ22 , ρ .
5.3 Expected Values of Nonlinear Functions 357

Solution For convenience, let C = E{X Y } = ρσ1 σ2 , A = 21 σ12 s12 + 2Cs1 s2
  2
+σ22 s22 , and A(i j) = ∂i j A. Then, we easily have A(10) s1 =0,s2 =0 = σ12 s1
i+ j

3 ∂s ∂s
1 2 2 3
+ Cs2 s1 =0,s2 =0 = 0, A(01) s1 =0,s2 =0 = σ22 s2 + Cs2 s1 =0,s2 =0 = 0, A(20) = σ12 ,
A(11) = C, A(02) = σ22 , and A(i j) = 0 for i + j ≥ 3. Denoting the joint mgf of
(X, Y ) by M = M (s1 , s2 ) = exp(A) and employing the notation M (i j) = ∂ i Mj ,
i+ j

∂s1 ∂s2
 2
we get M (10) = M A(10) , M (20) = M A(20) + A(10) ,

 2
M (21) = M A(21) + A(20) A(01) + 2 A(11) A(10) + A(10) A(01)
 2
= M A(20) A(01) + 2 A(11) A(10) + A(10) A(01) , (5.3.1)
  
M (31) = M 3A(20) A(11) + A(10) A(01)
 2  
+ A(10) 3A(11) + A(10) A(01) , (5.3.2)
  2  2
M (22) = M A(20) A(02) + 2 A(11) + 4 A(11) A(10) A(01) + A(20) A(01)
 2  2  (01) 2
+ A(10) A(02) + A(10) A , (5.3.3)
M (41) = B41 M, (5.3.4)

and

M (32) = B32 M. (5.3.5)

Here,
 2  2
B41 = 3 A(20) A(01) + 12 A(20) A(11) A(10) + 6A(20) A(10) A(01)
 3  3
+4 A(11) A(10) + A(10) A(01) (5.3.6)

and
 2
B32 = 6A(20) A(11) A(01) + 3A(20) A(02) A(10) + 6A(11) A(10) A(01)
 2  3  2
+6 A(11) A(10) + A(02) A(10) + 3A(20) A(10) A(01)
 3  (01) 2
+ A(10) A . (5.3.7)
358 5 Normal Random Vectors

Recollecting
  2 that M(0, 0) = 1, we have  μ31 = 3ρσ13 σ2 from3 (5.3.2), μ22 =
1 + 2ρ σ1 σ2 from4 (5.3.3), μ41 = M (41) s1 =0,s2 =0 = 0 from (5.3.4) and (5.3.6),
2 2

and μ32 = M (32) s1 =0,s2 =0 = 0 from (5.3.5) and (5.3.7). ♦
In Exercise 5.15, it is shown that
  
E X 12 X 22 X 32 = 1 + 2 ρ212 + ρ223 + ρ231 + 8ρ12 ρ23 ρ31 (5.3.8)

for (X 1 , X 2 , X 3 ) ∼ N (0, K 3 ). Similarly, it is shown in Exercise 5.18 that

E {X 1 X 2 X 3 } = m 1 E {X 2 X 3 } + m 2 E {X 3 X 1 } + m 3 E {X 1 X 2 }
−2m 1 m 2 m 3 (5.3.9)

for a general tri-variate normal random vector (X 1 , X 2 , X 3 ) and that

E {X 1 X 2 X 3 X 4 } = E {X 1 X 2 } E {X 3 X 4 } + E {X 1 X 3 } E {X 2 X 4 }
+ E {X 1 X 4 } E {X 2 X 3 } − 2m 1 m 2 m 3 m 4 (5.3.10)

for a general quadri-variate normal random vector (X 1 , X 2 , X 3 , X 4 ). The results


(5.3.8)–(5.3.10) can also be obtained via the general formula (5.3.51).

5.3.2 Price’s Theorem

We now discuss a theorem that is quite useful in evaluating the expected values of
various non-linear functions such as the power functions, sign functions, and absolute
values of normal random vectors.
Denoting the covariance between X i and X j by

ρ̃i j = Ri j − m i m j , (5.3.11)

where Ri j = E X i X j and m i = E {X i }, the correlation coefficient ρi j between
ρ̃
X i and X j and variance σi2 of X i can be expressed as ρi j = √ i j and σi2 = ρ̃ii ,
ρ̃ii ρ̃ j j
respectively.
2 3
Theorem 5.3.1 Let K = ρ̃r s be the covariance matrix of an n-variate normal
random vector X. When {gi (·)}i=1
n
are all memoryless functions, we have

  
3 More specifically, we have μ31 = M (31) s =0,s =0 = 3M A(20) A(11) s =0,s =0 = 3σ12 C =
1 2 1 2
3ρσ13 σ2 .
  2
4 More specifically, we have μ22 = M (22) s =0,s =0 = M A(20) A(02) + 2M A(11)
  1 2
|s1 =0,s2 =0 = σ12 σ22 + 2C 2 = 1 + 2ρ2 σ12 σ22 .
5.3 Expected Values of Nonlinear Functions 359
 
1
n
 n 
∂ γ1 E gi (X i ) * (γ )
i=1 1
= γ E gi (X i ) ,
3
(5.3.12)
∂ ρ̃rk11s1 ∂ ρ̃rk22s2 · · · ∂ ρ̃rkNN s N 22 i=1

+
N +
N +N
where γ1 = k j , γ2 = k j δr j s j , and γ3 = i j k j . Here, δi j is the Kro-
j=1 j=1 j=1

necker delta function defined as (4.3.17), N ∈ 1, 2, . . . , 21 n(n + 1) , gi(k) (x) = ddx k
k

gi (x), and r j ∈ {1, 2, . . . , n} and s j ∈ {1, 2, . . . , n} for j = 1, 2, . . . , N . In addi-


tion, i j = δir j + δis j ∈ {0, 1, 2} denotes how many of r j and s j are equal to i for
+
n
i = 1, 2, . . . , n and j = 1, 2, . . . , N and satisfies i j = 2.
i=1
Based on Theorem 5.3.1, referred to as Price’s theorem (Price 1958), let us
describe how we can obtain the joint moments and the expected values of non-linear
functions of normal random vectors in the three cases of n = 1, 2, and 3.

5.3.2.1 Uni-Variate Normal Random Vectors

When n = 1 in Theorem 5.3.1, we have N = 1, r1 = 1, s1 = 1, and 11 = 2. Let


k = k1 δ11 , and use m for m 1 = E {X 1 } and ρ̃ for ρ̃11 = σ 2 = Var {X 1 } by deleting
the subscripts for brevity. We can then express (5.3.12) as
 k
∂k 1 
E{g(X )} = E g (2k) (X ) . (5.3.13)
∂ ρ̃k 2

Meanwhile, for the pdf f X (x) of X , we have

lim f X (x) = δ (x − m) . (5.3.14)


ρ̃→0

Based on (5.3.13) and (5.3.14), let us obtain the expected value E {g(X )} for a normal
random variable X .
  
Example 5.3.2 For a normal random variable X ∼ N m, σ 2 , obtain Υ̃ = E X 3 .

Solution Letting g(x) = x 3 , we have g (2) (x) = 6x. Thus, we get ∂∂ρ̃ Υ̃ =

1
2
E g (2) (X ) = 3E {X } = 3m, i.e., Υ̃ = 3m ρ̃ + c from (5.3.13) with k = 1, where
c is the integration constant. Subsequently, we have

E X 3 = 3mσ 2 + m 3 (5.3.15)

∞
because c = m 3 from Υ̃ → −∞ x 3 δ(x − m)d x = m 3 recollecting (5.3.14). ♦
360 5 Normal Random Vectors

We now derive a general formula for the moment E {X a }. Let us use an underline
as
 n−1
, n is odd,
n= 2 (5.3.16)
n
2
, n is even
4n5
to denote the quotient 2
of a non-negative integer n when divided by 2.
Theorem 5.3.2 For X ∼ N (m, ρ̃), we have

 .
a
a!
E Xa = ρ̃ j m a−2 j (5.3.17)
j=0
2 j j! (a − 2 j)!

for a = 0, 1, . . ..

Proof The proof is left as an exercise, Exercise 5.27. ♠


Example 5.3.3 Using (5.3.17), we have

E X 4 = 3ρ̃2 + 6m 2 ρ̃ + m 4 (5.3.18)

for X ∼ N (m, ρ̃). When m = 0, (5.3.18) is the same as (3.3.32). ♦

5.3.2.2 Bi-variate Normal Random Vectors

When n = 2, specific simpler expressions of (5.3.12) for all possible pairs (n, N )
are shown in Table 5.1. Let us consider the expected value E {g1 (X 1 ) g2 (X 2 )} for
a normal random vector X = (X 1 , X 2 ) with mean vector m = (m 1 , m 2 ) and covari-
ρ̃ ρ̃
ance matrix K = 11 12 assuming n = 2, N = 1, r1 = 1, and s1 = 2 in Theorem
ρ̃12 ρ̃22
5.3.1. Because 11 = 1 and 21 = 1, we can rewrite (5.3.12) as

∂k
E {g1 (X 1 ) g2 (X 2 )} = E g1(k) (X 1 ) g2(k) (X 2 ) . (5.3.19)
∂ ρ̃k12

First, find a value k for which the right-hand side E g1(k) (X 1 ) g2(k) (X 2 ) of (5.3.19)
is simple to evaluate, and then obtain the expected value. Next, integrate the expected
value with respect to ρ̃12 to obtain E {g1 (X 1 ) g2 (X 2 )}. Note that, when ρ̃12 = 0, we
have ρ12 = σρ̃112σ2 = 0 and therefore X 1 and X 2 are independent of each other from
Theorem 5.2.3: this implies, from Theorem 4.3.6, that


E g1(k) (X 1 ) g2(l) (X 2 )  = E g1(k) (X 1 ) E g2(l) (X 2 ) (5.3.20)
ρ̃12 =0
5.3 Expected Values of Nonlinear Functions 361

Table 5.1 Specific formulas of Price’s theorem for all possible pairs (n, N ) when n = 2
  N  N  n N
(n, N ) r j , s j j=1 , δr j s j j=1 , i j i=1 j=1 :
Specific formula of (5.3.12)
(2, 1) (r1 , s1 ) = (1, 1), δr1 s1 = 1, 11 = 2, 21 = 0:
∂k
 1 k (2k)
E {g1 (X 1 ) g2 (X 2 )} = 2 E g1 (X 1 ) g2 (X 2 )
∂ ρ̃k11
(2, 1) (r1 , s1 ) = (1, 2), δr1 s1 = 0, 11 = 1,21 = 1:
∂k
E {g1 (X 1 ) g2 (X 2 )} = E g1 (X 1 ) g2(k) (X 2 )
(k)
∂ ρ̃k12
(2, 2) (r1 , s1 ) = (1, 1), (r2 , s2 ) = (1, 2), δr1 s1 = 1, δr2 s2 = 0,
11 = 2, 21 = 0, 12 = 1, 22 = 1:
∂ k1 +k2
k k E {g1 (X 1 ) g2 (X 2 )} =
∂ ρ̃111 ∂ ρ̃122
 1 k1
2 E g1(2k1 +k2 ) (X 1 ) g2(k2 ) (X 2 )
(2, 2) (r1 , s1 ) = (1, 1), (r2 , s2 ) = (2, 2), δr1 s1 = 1, δr2 s2 = 1,
11 = 2, 21 = 0, 12 = 0, 22 = 2:
∂ k1 +k2
k k E {g1 (X 1 ) g2 (X 2 )} =
∂ ρ̃111 ∂ ρ̃222
 1 k1 +k2 (2k ) (2k )
2 E g1 1 (X 1 ) g2 2 (X 2 )
(2, 3) (r1 , s1 ) = (1, 1), (r2 , s2 ) = (1, 2), (r3 , s3 ) = (2, 2),
δr1 s1 = 1, δr2 s2 = 0, δr3 s3 = 1,
11 = 2, 21 = 0, 12 = 1, 22 = 1, 13 = 0, 23 = 2:
∂ k1 +k2 +k3
k k k E {g1 (X 1 ) g2 (X 2 )} =
∂ ρ̃111 ∂ ρ̃122 ∂ ρ̃223
 1 k1 +k3 (2k1 +k2 ) (k +2k3 )
2 E g1 (X 1 ) g2 2 (X 2 )

for k, l = 0, 1, . . ., which can be used to determine integration constants. In short,


when we can easily evaluate the expected value of the product of the derivatives
of g1 and g2 (e.g., when we have a constant or an impulse function after a few
times of differentiations of g1 and/or g2 ), Theorem 5.3.1 is quite useful in obtaining
E {g1 (X 1 ) g2 (X 2 )}.

Example 5.3.4 For a normal random vector X= (X 1 , X 2 ) with mean vector m =
ρ̃ ρ̃ 
(m 1 , m 2 ) and covariance matrix K = 11 12 , obtain Υ̃ = E X 1 X 22 .
ρ̃12 ρ̃22

d Υ̃
Solution With k = 1, g1 (x) = x, and g2 (x) = x 2 in (5.3.19), we get d ρ̃12
=
E g1(1) (X 1 ) g2(1)
(X 2 ) = E {2X 2 } = 2m 2 , i.e., Υ̃ = 2m 2 ρ̃12 + c. Recollecting
  

(5.3.20), we have c = Υ̃  = m 1 ρ̃22 + m 22 . Thus, we finally have
ρ̃12 =0

  
E X 1 X 22 = 2m 2 ρ̃12 + m 1 ρ̃22 + m 22 . (5.3.21)
362 5 Normal Random Vectors
 
The result (5.3.21) is the same as the result E W Z 2 = 2m 2 ρσ1 σ2 + m 1 σ22 +
m 22 for a random vector (W, Z ) = (σ1 X + m 1 , σ2 Y + m 2 ) which we would obtain
after some steps based on E X Y 2 = 0 for (X, Y ) ∼ N (0, 0, 1, 1, ρ). In addition,
when X 1 = X 2 = X , (5.3.21) is the same as (5.3.15). ♦
 a b
A general formula for the joint moment E X 1 X 2 is shown in the theorem below.

Theorem 5.3.3 The joint moment E X 1a X 2b can be expressed as

a− j b− j
 . ..
min(a,b) j p q
a!b! ρ̃12 ρ̃11 ρ̃22 m 1 m2
a− j−2 p b− j−2q
E X 1a X 2b = (5.3.22)
j=0 p=0 q=0
2 p+q j! p!q!(a − j − 2 p)!(b − j − 2q)!

 
for (X 1 , X 2 ) ∼ N m 1 , m 2 , ρ̃11 , ρ̃22 , √ρ̃12 , where a, b = 0, 1, . . ..
ρ̃1 ρ̃22

Proof A proof is provided in Appendix 5.1. ♠

 + +j 3−
2 2− +j j p q 2− j−2 p
12ρ̃12 ρ̃11 ρ̃22 m 1
Example 5.3.5 We can obtain E X 12 X 23 = 2 p+q j! p!q! (2− j−2 p)!
j=0 p=0 q=0
3− j−2q
m2
(3− j−2q)!
, i.e.,

E X 12 X 23 = m 21 m 32 + 3ρ̃22 m 21 m 2 + ρ̃11 m 32 + 6ρ̃12 m 1 m 22
+3ρ̃11 ρ̃22 m 2 + 6ρ̃12 ρ̃22 m 1 + 6ρ̃212 m 2 (5.3.23)

from (5.3.22). ♦
 
Theorem 5.3.4 For (X 1 , X 2 ) ∼ N m 1 , m 2 , σ12 , σ22 , ρ , the joint central moment

μab = E (X 1 − m 1 )a (X 2 − m 2 )b can be obtained as (Johnson and Kotz 1972;
Mills 2001; Patel and Read 1996)

⎨ 0, a + b is odd,
μab = +
t
(2ρσ1 σ2 )2 j+ξ (5.3.24)
a!b!
⎩ 2g+h+ξ (g− j)!(h− j)!(2 j+ξ)!
, a + b is even
j=0

for a, b = 0, 1, . . . and satisfies the recursion

μab = (a + b − 1)ρσ1 σ2 μa−1,b−1


 
+(a − 1)(b − 1) 1 − ρ2 σ12 σ22 μa−2,b−2 , (5.3.25)

where g and h are the quotients of a and b, respectively, when divided by 2; ξ is the
residue when a or b is divided by 2; and t = min(g, h).
5.3 Expected Values of Nonlinear Functions 363

Example 5.3.6 When a = 2g, b = 2h, and m 1 = m 2 = 0, all the terms except for
those satisfying a − j − 2 p = 0 and b − j − 2q = 0 will be zero in (5.3.22), and
thus we have

 .
min(a,b)
a!b! j
E X 1a X 2b =     ρ̃12 m 01 m 02
j=0,2,... 2
g+h− j j! 2 ! 2 !0!0!
a− j b− j

.
min(g,h)
a!b!
= (2ρ̃12 )2 j , (5.3.26)
j=0
2g+h (2 j)!(g − j)!(h − j)!

which is the same as the second line in the right-hand side of (5.3.24). Similarly,
when a = 2g + 1, b = 2h + 1, and m 1 = m 2 = 0, the result (5.3.22) is the same as
the second line in the right-hand side of (5.3.24). ♦

Example 5.3.7 (Gardner 1990)  Obtain Υ̃ = E {sgn (X 1 ) sgn (X 2 )} for X =


(X 1 , X 2 ) ∼ N 0, 0, σ12 , σ22 , ρ .

Solution First, note that ddx g(x) = ddx sgn(x) = 2δ(x) and that E {δ (X 1 ) δ (X 2 )}
 
= f (0, 0), where f denotes the pdf of N 0, 0, σ12 , σ22 , ρ . Letting k = 1 in (5.3.19),
we have d Υ̃
d ρ̃
= E g1(1) (X 1 ) g2(1) (X 2 ) = 4 f (0, 0) = √
2
, i.e.,
πσ1 σ2 1−ρ2

d Υ̃ 2 1
=  (5.3.27)
dρ π 1 − ρ2

because ρ̃ = ρσ1 σ2 . Integrating this result, we get5

2
E {sgn (X 1 ) sgn (X 2 )} = sin−1 ρ + c. (5.3.28)
π

Subsequently, because E {sgn (X 1 ) sgn (X 2 )}|ρ=0 = E {sgn (X 1 )} E {sgn (X 2 )}


= 0 from (5.3.20), we finally have

2
E {sgn (X 1 ) sgn (X 2 )} = sin−1 ρ. (5.3.29)
π

The result (5.3.29) implies that, when X 1 = X 2 , we have E {sgn (X 1 ) sgn (X 2 )} =


E{1} = 1. Table 5.2 provides the expected values for some non-linear functions of
(X 1 , X 2 ) ∼ N (0, 0, 1, 1, ρ). ♦

2 3
5 Here, the range of sin−1 x is set as − π2 , π2 .
364 5 Normal Random Vectors

Table 5.2 Expected value E {g1 (X 1 ) g2 (X 2 )} for some non-linear functions g1 and g2 of
(X 1 , X 2 ) ∼ N (0, 0, 1, 1, ρ)
g1 (X 1 )
X1 |X 1 | sgn (X 1 ) δ (X 1 )

g2 (X 2 ) X2 ρ 0 π ρ
2
0
   
|X 2 | 0 2
ρ sin −1 ρ + 1 − ρ2 0 1
1 − ρ2
 π π
−1 ρ
sgn (X 2 ) π ρ
2 2
0 π sin 0

δ (X 2 ) 0 π 1−ρ
1 2 0 √1
  2π 1−ρ2

Note. dρ d
ρ sin−1 ρ + 1 − ρ2 = sin−1 ρ. dρ d
sin−1 ρ = √ 1 2 .
 1−ρ
E {|X i |} = π . E {δ (X i )} =
2 √1
.

Denoting the pdf of a standard bi-variate normal random vector (X 1 , X 2 )


by f ρ (x, y), we have f ρ (−x, −y) = f ρ (x, y) and f ρ (−x, y) = f ρ (x, −y) =
f −ρ (x, y). Then, it is known (Kamat 1958) that the partial moment [r, s] =
∞∞ r s
0 0 x y f ρ (x, y)d xd y is
⎧ 


⎪ 4 π (1 + ρ), r = 1, s = 0,
1 2

⎪ π  

⎨ −1
1

ρ 2 + sin ρ + 1 − ρ , r = 1, s = 1,
2
[r, s] =   (5.3.30)

⎪ 1
3ρ π2 + sin−1 ρ

⎪ 2π

⎪  
⎩ + 2 + ρ2 1 − ρ2 , r = 3, s = 1

 
and the absolute moment νr s = E  X 1r X 2s  is
r +s      
2 2 r +1 s+1 1 1 1 2
νr s = Γ Γ 2 F1 − r, − s; ; ρ . (5.3.31)
π 2 2 2 2 2

Here, 2 F1 (α, β; γ; z) denotes the hypergeometric function introduced in (1.A.24).




Based on (5.3.30) and (5.3.31), we can obtain ν11 = π2 1 − ρ2 + ρ sin−1 ρ ,
    
ν12 = ν21 = π2 1 + ρ2 , ν13 = ν31 = π2 2 + ρ2 1 − ρ2 + 3ρ sin−1 ρ ,
     
ν22 = 1 + 2ρ2 , ν14 = ν41 = π2 3 + 6ρ2 − ρ4 , ν23 = π8 1 + 3ρ2 , ν15 =
   
ν51 = π2 8 + 9ρ2 − 2ρ4 1 − ρ2 + 15ρ sin−1 ρ , ν42 = ν24 = 3 1 + 4ρ2 , and
   
ν33 = π2 4 + 11ρ2 1 − ρ2 + 3 3 + 2ρ2 ρ sin−1 ρ (Johnson and Kotz 1972).
5.3 Expected Values of Nonlinear Functions 365

5.3.2.3 Tri-variate Normal Random Vectors

Let us briefly discuss the case n = 3 in Theorem 5.3.1. Letting N = 3, r1 = 1, s1 = 2,


r2 = 2, s2 = 3, r3 = 3, and s3 = 1, we have 11 = 1, 12 = 0, 13 = 1, 21 = 1, 22 =
1, 23 = 0, 31 = 0, 32 = 1, and 33 = 1. Then, for Υ̃ = E {g1 (X 1 ) g2 (X 2 ) g3 (X 3 )},
we can rewrite (5.3.12) as

  +3 k j δr j s j
∂ k1 +k2 +k3 1 j=1
k1 k2 k3
Υ̃ = E g1( 11 k1 + 12 k2 + 13 k3 ) (X 1 )
∂ ρ̃12 ∂ ρ̃23 ∂ ρ̃31 2
× g2( 21 k1 + 22 k2 + 23 k3 ) (X 2 ) g3( 31 k1 + 32 k2 + 33 k3 ) (X 3 )

= E g1(k1 +k3 ) (X 1 ) g2(k1 +k2 ) (X 2 ) g3(k2 +k3 ) (X 3 ) . (5.3.32)

In addition, similarly to (5.3.20), we have


( j) 
E g1 (X 1 ) g2(k) (X 2 ) g3(l) (X 3 ) 
ρ̃12 =ρ̃23 =ρ̃31 =0
( j)
= E g1 (X 1 ) E g2(k) (X 2 ) E g3(l) (X 3 ) (5.3.33)

and

( j) 
E g1 (X 1 ) g2(k) (X 2 ) g3(l) (X 3 ) 
ρ̃31 =ρ̃12 =0
( j)
= E g1 (X 1 ) E g2(k) (X 2 ) g3(l) (X 3 ) (5.3.34)

for j, k, l = 0, 1, . . .. For ρ̃12 = ρ̃23 = 0 and ρ̃23 = ρ̃31 = 0 as well, we can


obtain formulas similar to (5.3.34). These formulas can all be used to determine
E {g1 (X 1 ) g2 (X 2 ) g3 (X 3 )}.
Example 5.3.8 Obtain Υ̃ = 2E {X 3 1 X 2 X 3 } for X = (X 1 , X 2 , X 3 ) ∼ N (m, K ) with
m = (m 1 , m 2 , m 3 ) and K = ρ̃i j .
∂ ∂ ∂
Solution From (5.3.32), we have ∂ ρ̃12
Υ̃ = E {X 3 } = m 3 , ∂ ρ̃23
Υ̃ = m 1 , and ∂ ρ̃31
Υ̃ =
m 2 . Thus, Υ̃ = m 3 ρ̃12 + m 1 ρ̃23 + m 2 ρ̃31 + c. Now, when ρ̃12 = ρ̃23 = ρ̃31 = 0, we
have Υ̃ = c = E {X 1 } E {X 2 } E {X 3 } = m 1 m 2 m 3 as we can see from (5.3.33). Thus,
we have

E {X 1 X 2 X 3 } = m 1 ρ̃23 + m 2 ρ̃31 + m 3 ρ̃12 + m 1 m 2 m 3 . (5.3.35)

The result (5.3.35) is the same as (5.3.9), as (5.3.21) when X 2 = X 3 , and as


(5.3.15) when X 1 = X 2 = X 3 = X . For a zero-mean tri-variate normal random vec-
tor, (5.3.35) implies E {X 1 X 2 X 3 } = 0. ♦
366 5 Normal Random Vectors

Consider a standard tri-variate normal random vector X ∼ N (0, K 3 ) and its pdf
f 3 (x, y, z) as described in Sect. 5.1.3. For the partial moment
∞ ∞ ∞
[r, s, t] = x r y s z t f 3 (x, y, z)d xd ydz (5.3.36)
0 0 0

and absolute moment


 
νr st = E  X 1r X 2s X 3t  , (5.3.37)

it is known that we have6


1 π π 
[1, 0, 0] = √ + sin−1 β23,1 + ρ12 + sin−1 β31,2
8π 3 2 2
π 
+ρ31 + sin−1 β12,3 , (5.3.38)
2
  
π . −1
c
1
[2, 0, 0] = + sin ρi j + ρ12 1 − ρ212 + ρ31 1 − ρ231
4π 2

 ⎬
|K | ρ
3 23
+ (2ρ31 ρ12 − ρ23 ) 1 − ρ223 +  , (5.3.39)

1 − ρ223
 / 0 
π . −1
c
1
[1, 1, 0] = ρ12 + sin ρi j + 1 − ρ212
4π 2
  
+ρ23 1 − ρ231 + ρ31 1 − ρ223 , (5.3.40)

 
1  .
c
  π −1

[1, 1, 1] = √ |K 3 | + ρi j + ρ jk ρki + sin βi j,k , (5.3.41)
8π 3 2

 
2  
ν211 = (ρ23 + 2ρ12 ρ31 ) sin−1 ρ23 + 1 + ρ212 + ρ231 1 − ρ223 , (5.3.42)
π

and

  
6The last term |K 3 | ρ23 1 − ρ223 of [2, 0, 0] and the last two terms ρ23 1 − ρ223 + ρ31 1 − ρ231
of [1, 1, 0] given in (Johnson andKotz 1972, Kamat  1958) some references should be corrected
|K 3 |ρ23
into  as in (5.3.39) and ρ23 1 − ρ231 + ρ31 1 − ρ223 as in (5.3.40), respectively.
1−ρ223
5.3 Expected Values of Nonlinear Functions 367

2 
ν221 = 1 + 2ρ212 + ρ223 + ρ231 + 4ρ12 ρ23 ρ31 − ρ223 ρ231 . (5.3.43)
π

+
c
In (5.3.40) and (5.3.41), the symbol denotes the cyclic sum: for example, we have

.
c
sin−1 ρi j = sin−1 ρ12 + sin−1 ρ23 + sin−1 ρ31 . (5.3.44)

     
We will have 2+3−1 3
= 4, 3+3−13
= 10, 4+3−1 3
= 20 different7 cases, respec-
tively, of the expected value E {g1 (X 1 ) g2 (X 2 ) g3 (X 3 )} for two, three, and
four options as the function gi . For a standard tri-variate normal random vec-
tor X = (X 1 , X 2 , X 3 ), consider four functions {x, |x|, sgn(x), δ(x)} of gi (x).
Among the 20 expected values, due to the symmetry of the standard normal
distribution, the four expected values E {X 1 X 2 sgn (X 3 )}, E {X 1 sgn (X 2 ) sgn (X 3 )},
E {sgn (X 1 ) sgn (X 2 ) sgn (X 3 )}, E {X 1 X 2 X 3 } of products of three odd functions
and the six expected values E {X 1 δ (X 2 ) δ (X 3 )}, E {sgn (X 1 ) |X 2 | |X 3 |}, E {sgn
(X 1 ) |X 2 | δ (X 3 )}, E {sgn (X 1 ) δ (X 2 ) δ (X 3 )}, E {X 1 |X 2 | |X 3 |}, E {X 1 |X 2 | δ
(X 3 )} of products of two odd functions and one even function are zero.
In addition, we easily get E {δ (X 1 ) δ (X 2 ) δ (X 3 )} = f 3 (0, 0, 0), i.e.,

1
E {δ (X 1 ) δ (X 2 ) δ (X 3 )} =  (5.3.45)
8π 3 |K 3 |

based on (5.1.19), and the nine remaining expected values are considered in
Exercises 5.21–5.23. The results of these ten expected values are summarized
in Table 5.3. Meanwhile, some results shown in Table 5.3 can be verified using
(5.3.38)–(5.3.43): for instance, we can reconfirm E {X 1 |X 2 | sgn (X 3 )} = 21 ∂ν 211
= π2
   ∂ρ31

ρ12 sin−1 ρ23 + ρ31 1 − ρ223 via ν211 shown in (5.3.42) and E {X 1 X 2 |X 3 |}

= 14 ∂ρ∂12 ν221 = π2 (ρ12 + ρ23 ρ31 ) via ν221 shown in (5.3.43).

5.3.3 General Formula for Joint Moments

For a natural number n, let

a = {a1 , a2 , . . . , an } (5.3.46)

with ai ∈ {0, 1, . . .} a set of non-negative integers and let

7 This number is considered in (1.E.24).


368 5 Normal Random Vectors

Table 5.3 Expected values E {g1 (X 1 ) g2 (X 2 ) g3 (X 3 )} of some products for a standard tri-variate
normal random vector (X 1 , X 2 , X 3 )
E {δ (X 1 ) δ (X 2 ) δ (X 3 )} = √ 13
 8π |K 3 |

E {X 1 X 2 |X 3 |} = π2 (ρ12 + ρ23 ρ31 )



E {δ (X 1 ) δ (X 2 ) |X 3 |} = √ 3 1 2  |K 3 |
2π 1−ρ12
E {δ (X 1 ) X 2 X 3 } = √1 (ρ23 − ρ31 ρ12 )

E {δ (X 1 ) sgn (X 2 ) X 3 } =  1 2 (ρ23 − ρ31 ρ12 )
π 1−ρ
  12 √
E {δ (X 1 ) |X 2 | |X 3 |} = π3 (ρ23 − ρ31 ρ12 ) sin−1 β23,1 + |K 3 |
2

E {δ (X 1 ) sgn (X 2 ) sgn (X 3 )} = π23 sin−1 β23,1
  
E {X 1 |X 2 | sgn (X 3 )} = π2 ρ12 sin−1 ρ23 + ρ31 1 − ρ223
  
E {sgn (X 1 ) sgn (X 2 ) |X 3 |} = π83 sin−1 β12,3 + ρ23 sin−1 β31,2 + ρ31 sin−1 β23,1
  c 

√ + 
E {|X 1 X 2 X 3 |} = π83 |K 3 | + ρi j + ρ jk ρki sin−1 βi j,k


l = l11 , l12 , . . . , l1n , l22 , l23 , . . . , l2n , . . . , ln−1,n−1 , ln−1,n , lnn (5.3.47)

with li j ∈ {. . . , −1, 0, 1, . . .} a set of integers. Given a and l, define the collection


 n n  n
Sa = l : li j ≥ 0 j=i i=1
, L a,k ≥ 0 k=1
(5.3.48)

of l, where

.
n
L a,k = ak − lkk − l jk (5.3.49)
j=1

for k = 1, 2, . . . , n and

l ji = li j (5.3.50)

for j > i. A general formula for the joint moments of normal random vectors can
now be obtained as shown in the following theorem:
2 3
Theorem 5.3.5 For X ∼ N (m, K ) with m = (m 1 , m 2 , . . . , m n ) and K = ρ̃i j , we
have
 n  ⎛ ⎞⎛ ⎞
* a . *n *
n *
n
da,l ⎝ ρ̃i j ⎠ ⎝ m j ⎠ ,
l L
E Xkk = i j a, j
(5.3.51)
k=1 l∈Sa i=1 j=i j=1
5.3 Expected Values of Nonlinear Functions 369

where
/ n 0⎛ ⎞−1 ⎛ ⎞−1
* *
n *
n *
n
da,l = 2−Ml ak ! ⎝ l i j !⎠ ⎝ L a, j !⎠ (5.3.52)
k=1 i=1 j=i j=1

+
n
with Ml = lii .
i=1

Proof The proof is shown in Appendix 5.1. ♠

Note that, when any of the 21 n(n + 1) elements of l or any of the n elements of
/ 0−1 / 0−1
 n 1
n 1 n 1n
L a, j j=1 is a negative integer, we have li j ! L a, j ! = 0 because
i=1 j=i j=1
+
(−k)! → ±∞ for k = 1, 2, . . .. Therefore, the collection Sa in of (5.3.51) can
l∈Sa
be replaced with the collection of all sets of 21 n(n + 1) integers. Details for obtaining

E X 1 X 2 X 32 based on Theorem 5.3.5 as an example are shown in Table 5.4 in the
case of a = {1, 1, 2}.

Example 5.3.9 Based on Theorem 5.3.5, we easily get

E {X 1 X 2 X 3 } = m 1 m 2 m 3 + ρ̃12 m 3 + ρ̃23 m 1 + ρ̃31 m 2 (5.3.53)

and

E {X 1 X 2 X 3 X 4 } = m 1 m 2 m 3 m 4 + m 1 m 2 ρ̃34 + m 1 m 3 ρ̃24 + m 1 m 4 ρ̃23


+m 2 m 3 ρ̃14 + m 2 m 4 ρ̃13 + m 3 m 4 ρ̃12
+ρ̃12 ρ̃34 + ρ̃13 ρ̃24 + ρ̃14 ρ̃23 . (5.3.54)

 3
Table 5.4 Element sets l = {l11 , l12 , l13 , l22 , l23 , l33 } of Sa , L a, j j=1 , coefficient da,l , and the

terms in E X 1 X 2 X 3 for each of the seven element sets when a = {1, 1, 2}
2

{l11 , l12 , l13 , l22 , l23 , l33 } L a,1 , L a,2 , L a,3 da,l Terms
1 {0, 0, 0, 0, 0, 0} {1, 1, 2} 1 m 1 m 2 m 23
2 {0, 0, 0, 0, 1, 0} {1, 0, 1} 2 2ρ̃23 m 1 m 3
3 {0, 0, 1, 0, 0, 0} {0, 1, 1} 2 2ρ̃13 m 2 m 3
4 {0, 0, 1, 0, 1, 0} {0, 0, 0} 2 2ρ̃13 ρ̃23
5 {0, 1, 0, 0, 0, 0} {0, 0, 2} 1 ρ̃12 m 23
6 {0, 0, 0, 0, 0, 1} {1, 1, 0} 1 ρ̃33 m 1 m 2
7 {0, 1, 0, 0, 0, 1} {0, 0, 0} 1 ρ̃12 ρ̃33
370 5 Normal Random Vectors

Note that (5.3.53) is the same as (5.3.9) and (5.3.35), and (5.3.54) is the same as
(5.3.10). ♦
When the mean vector is 0 in Theorem 5.3.5, we have
 n  / n 0 ⎛ ⎞
* a * . * n * n
ρ̃
li j

2−Ml ⎝ ⎠,
i j
E Xkk = ak ! (5.3.55)
k=1 k=1 i=1 j=i
l i j !
l∈Ta

+
n +
n
where Ta denotes the collection of l such that l11 + lk1 = a1 , l22 + lk2 = a2 ,
k=1 k=1
+
n  n n
. . ., lnn + lkn = an , and li j ≥ 0 j=i i=1
. In other words, Ta is the same as Sa
k=1
with L a,k ≥ 0 replaced by L a,k = 0 in (5.3.48).
 +
n
Theorem 5.3.6 We have E X 1a1 X 2a2 · · · X nan = 0 for X ∼ N (0, K ) when ak is
k=1
an odd number.

+
n +
n
Proof Adding lkk + lk j = ak for k = 1, 2, . . . , n, we have ak = M l +
j=1 k=1
+
n +
n +
n−1 +
n +n
li j = 2Ml + 2 li j , which is an even number. Thus, when ak is
i=1 j=1 i=1 j=i+1
 a1 a2 k=1
an odd number, the collection Ta is a null set and E X 1 X 2 . . . X nan = 0. ♠
Example 5.3.10 For a zero-mean n-variate normal random vector, assume a =
1 = {1, 1, . . . , 1}. When n is an odd number, E {X 1 X 2 · · · X n } = 0 from Theorem
5.3.6. Next, assume n is even. Over a non-negative integer region, if lkk = 0 for
k = 1, 2, . . . , n and one of {l1k , l2k , . . . , lnk } − {lkk } is 1 and all the others are 0, we
+ n
lik = 1. Now, if l ∈ Ta , because da,l = d1,l =  1n(1!)
n
have lkk + = 1, we
i=1 20 1! (0!)n
i=1
have (Isserlis 1918)

.*
n *
n
l
E {X 1 X 2 · · · X n } = ρ̃iijj . (5.3.56)
l∈Ta i=1 j=i

Next, assigning 0, 1, and 0 to lkk , one of {l1k , l2k , . . . , lnk } − {lkk }, and all the others
of {l1k , l2k , . . . , lnk } − {lkk }, respectively, for k = 1, 2, . . . , n is the same as assigning
1 to each pair after dividing {1, 2, . . . , n} into n pairs of two numbers. Here, a pair
( j, k) represents the subscript of l jk . Now, recollecting that there are n! possibilities
for the same choice with a different order, the number of ways to divide {1, 2, . . . , n}
      −1
into n pairs of two numbers is n2 n−2 2
· · · 22 n! = 2n! n n! . In short, the number of

elements in Ta , i.e., the number of non-zero terms on the right-hand side of (5.3.56),
n n! = (2n − 1)!!.
is 2n! ♦
5.4 Distributions of Statistics 371

5.4 Distributions of Statistics

Often, the terms sample and random sample (Abramowitz and Stegun 1972; Grad-
shteyn and Ryzhik 1980) are used to denote an i.i.d random vector, especially
in statistics. A function of a sample is called a statistic. With a sample8 X =
(X 1 , X 2 , . . . , X n ) of size n, the mean E {X i } and the variance Var (X i ) of the com-
ponent random variable X i are called the population mean and population variance,
respectively. Unless stated otherwise, we assume that population mean and popula-
 σ , respectively,
2
tion variance are m and for the samples considered in this section.
We also denote by E (X i − m)k = μk the k-th population central moment of X i
for k = 0, 1, . . ..

5.4.1 Sample Mean and Sample Variance

Definition 5.4.1 (sample mean) The statistic

1.
n
Xn = Xi (5.4.1)
n i=1

for a sample X = (X 1 , X 2 , . . . , X n ) is called the sample mean of X.

Theorem 5.4.1 We have the expected value



E Xn = m (5.4.2)

and variance

 σ2
Var X n = (5.4.3)
n

for the sample mean X n .


 +
n 
Proof First, E X n = n1 E {X i } = m . Using this result and the fact that E X i X j =
 i=1  
E {X i } E X j = m 2 for i = j and E X i X j = E X i2 = σ 2 + m 2 for i= j, we
 2  1 + X + X
n n
have (5.4.3) from Var X n = E X n − E2 X n = E n 2 i j − m2 =
i=1 j=1

8In several fields including engineering, the term sample is often used to denote an element X i of
X = (X 1 , X 2 , . . . , X n ).
372 5 Normal Random Vectors
⎡ ⎤
⎢+
n +
n +
n  ⎥  
1 ⎢ E X i2 + E Xi X j ⎥ 2 1 n σ 2 + m 2 + n(n − 1)m 2 − m 2 .
n2 ⎣ ⎦ − m = n2
i=1 i=1 j=1
i= j

 
Example 5.4.1 (Rohatgi and Saleh 2001) Obtain the third central moment μ3 X n
of the sample mean X n .
 n
   3 +
Solution The third central moment μ3 X n = E X n − m = n3 E
1

3  i=1

(X i − m) of X = (X 1 , X 2 , . . . , X n ) can be expressed as

  1 .  1 ..   
n n n
μ3 X n = 3 E (X i − m)3 + 3 E (X i − m)2 X j − m
n i=1 n i=1 j=1
i= j

1 ...   
n n n
+ 3 E (X i − m) X j − m (X k − m) . (5.4.4)
n i=1 j=1 k=1
i= j, j=k,k=i

Now, noting that E {X i − m} =0 and that X i and X j are independent of each
other for i = j, we have E (Xi − m)2 X j − m = E (X i − m)2 E X j −
m  = 0 for i = j and E (X i − m) X j − m (X k − m) = E {X i − m}
E X j − m E {X k − m} = 0 for i = j, j = k, and k = i. Thus, we have
  μ3
μ3 X n = 2 (5.4.5)
n

  +
n 
from μ3 X n = 1
n3
E (X i − m)3 . ♦
i=1

Definition 5.4.2 (sample variance) The statistic

1 . 2
n
Wn = Xi − X n (5.4.6)
n − 1 i=1

is called the sample variance of X = (X 1 , X 2 , . . . , X n ).

Theorem 5.4.2 We have

E {Wn } = σ 2 , (5.4.7)

i.e., the expected value of sample variance is equal to the population variance.
5.4 Distributions of Statistics 373


Proof Let Yi = X i − m. Then, we have E {Yi } = 0, E Yi2 = σ 2 = μ2 , and
 +
n +
n
E Yi4 = μ4 . Next, letting Y = n1 Yi = n1 (X i − m), we have
i=1 i=1

.
n
 2 .
n
 2
Xi − X = Yi − Y (5.4.8)
i=1 i=1

n 
+ 2 n 
+   2 +
n n 
+  2
from Xi − X = Xi − m − X − m = Yi − n1 Xk − m . In
i=1 i=1 i=1 k=1
n 
+ 2 n 
+ 2
 +
n +
n
addition, because Yi − Y = Yi2 − 2Y Yi + Y = Yi2 − 2Y Yi +
i=1 i=1 i=1 i=1
2 +
n 2
nY = Yi2 − nY , we have
i=1

.
n
 2 .
n
2
Xi − X = Yi2 − nY . (5.4.9)
i=1 i=1

n 
+ 2 +n  2
Therefore, we have E Xi − X = E Yi2 − nE Y = nσ 2 −
  i=1 i=1
+
n + n 2  3
n
n2
E Yi Y j = nσ 2 − n1 nE Yi2 + 0 = (n − 1)σ 2 and E {Wn } =
i=1 j=1
 n 

1 +
2
E n−1 Xi − X = σ2 . ♠
i=1

Note that, due to the factor n − 1 instead of n in the denominator of (5.4.6), the
expected value of sample variance is equal to the population variance as shown in
(5.4.7).

Theorem 5.4.3 (Rohatgi and Saleh 2001) We have the variance

μ4 (n − 3)μ22
Var {Wn } = − (5.4.10)
n n(n − 1)

of the sample variance Wn .

 2  
+
n 2 +
n +
n
Proof Letting Yi = X i − m, we have E Yi2 − nY =E Yi2
i=1 i=1 j=1

2+
n 4
Y j2 − 2nY Yi2 + n 2 Y , i.e.,
i=1
374 5 Normal Random Vectors
⎧/ 02 ⎫ ⎧ ⎫
⎨ .n ⎬ ⎨.n .
n .
n ⎬
2 2 4
E Yi2 − nY = E Yi2 Y j2 − 2nY Yi2 + n 2 Y
⎩ ⎭ ⎩ ⎭
i=1 i=1 j=1 i=1
 
2 .
n
4
= nμ4 + n(n − 1)μ22 − 2nE Y Yi2 + n2E Y . (5.4.11)
i=1

   
2 +
n +
n +
n +
n +
n +
n +
n
In (5.4.11), E Y Yi2 = 1
n2
E Yi2 Y j Yk = 1
n2
E Yi4 + Yi2 Y j2 can
i=1 i=1 j=1 k=1 i=1 i=1 j=1
i= j
be evaluated as
 
2 .
n
1
E Y Yi2 = μ4 + (n − 1)μ22 (5.4.12)
i=1
n

   
4 +
n +
n +
n +
n +
n +
n +
n
and E Y = 1
n4
E Yi Y j Yk Yl = 1
n4
E Yi4 +3 Yi2 Y j2 can
i=1 j=1 k=1 l=1 i=1 i=1 j=1
i= j
be obtained as9

4 1 
E Y = μ4 + 3(n − 1)μ22 . (5.4.13)
n3

Next, recollecting
 (5.4.9), (5.4.12),
and (5.4.13), if
 we rewrite (5.4.11),
n 
 )
+ 2 2 +
n 2
2
we have E Xi − X =E Yi2 − nY = nμ4 + n(n − 1)μ22 −
i=1 i=1
 n2

2n
n
μ4 + (n − 1)μ22 + n3
μ4 + 3(n − 1)μ22 , i.e.,
⎡ 2 ⎤  
.
n
 2
(n − 1) 2 (n − 1) n 2 − 2n + 3 2
E⎣ Xi − X ⎦= μ4 + μ2 . (5.4.14)
i=1
n n

 2 )
n 
+ 2
We get (5.4.10) from Var {Wn } = 1
(n−1)2
E Xi − X − μ22 using (5.4.7)
i=1
and (5.4.14). ♠

Theorem 5.4.4 We have


 
σ2
Xn ∼ N μ, (5.4.15)
n

9In this formula, the factor 3 results from the three distinct cases of i = j = k = l, i = k = j = l,
and i = l = k = j.
5.4 Distributions of Statistics 375

and consequently

n 
X n − μ ∼ N (0, 1) (5.4.16)
σ
 
for a sample X = (X 1 , X 2 , . . . , X n ) from N μ, σ 2 .

   
Proof Recollecting the mgf M(t) = exp μt + 21 σ 2 t 2 of N μ, σ 2 and
   n
using (5.E.23), the mgf of X n can be obtained as M X n (t) = M nt =
  2 
1 √σ
exp μt + 2 n
t 2 . Thus, we have (5.4.15), and (5.4.16) follows. ♠

Theorem 5.4.4
 can  also √
be shown from Theorem 5.2.5. More generally, we
σ2
     n

have X n ∼ N μ, n and n X n − μ ∼ N (0, 1) when X i ∼ N μi , σi2 i=1
σ2
+
n +
n
are independent of each other, where μ = 1
n
μi and σ 2 = 1
n
σi2 .
i=1 i=1

Theorem 5.4.5 The sample mean and sample variance of a normal sample are
independent of each other.


Proof We first show that A = X n and B = (V1 , V2 , . . . , Vn ) = X 1 − X n , X 2

−X n , . . . , X n − X n are independent of each other. Letting t  = (t, t1 , t2 , . . . ,
    
tn ), the
 joint mgf M A,B t
 = E exp t X n + t 1 V1 + t 2 V2 + · · · + t n Vn =
+n  
E exp t X n + ti X i − X n of A and B is obtained as
i=1

 / n 0 / n 0  )
  . .
M A,B t = E exp ti X i − ti − t X n
i=1 i=1

⎡ ⎛ ⎞ ⎫⎤
⎨. n
1 .n ⎬
= E ⎣exp ⎝nti + t − t j ⎠ Xi ⎦ . (5.4.17)
⎩ n ⎭
i=1 j=1

+
n  
Letting t = 1
n
ti , the joint mgf (5.4.17) can be expressed as M A,B t  =

i=1
       2 
1
n
t+nti −nt 1
n 1
n
exp μ t+ntni −nt + σ2 t+ntni −nt
2
E exp X i n = = exp
i=1 i=1 i=1
       1 n    
σ2
exp μt σ2 t 2
= exp μt + σ2nt
2 2
μ ti − t + 2n 2 t −t 2
2 2nt ti − t + n i n + 2n 2
i=1
1
n  2 1n   
σ2 σ2 t
exp 2 ti − t exp μ + n ti − t , or as
i=1 i=1
376 5 Normal Random Vectors

   
  σ2 .  2
n
σ2 t 2
M A,B t = exp μt + exp ti − t
2n 2 i=1
  n 
σ2 t .  
× exp μ+ ti − t . (5.4.18)
n i=1

n 
+  +n
Noting that ti − t = ti − nt = 0, we eventually have
i=1 i=1

   
  σ2 .  2
n
σ2 t 2
M A,B t = exp μt + exp ti − t . (5.4.19)
2n 2 i=1

 
Meanwhile, because A = X n ∼ N μ, σn as we have observed in Theorem 5.4.4,
2

the mgf of A is
 
σ2 2
M A (t) = exp μt + t . (5.4.20)
2n

+ n  
Recollecting ti − t = 0, the mgf of B = (V1 , V2 , . . . , Vn ) can be obtained
i=1 n 
+  
as M B (t) = E {exp (t1 V1 + t2 V2 + · · · + tn Vn )} = E exp ti X i − X n
 n   i=1
2 + +n 1
n 2    3 1n
= E exp ti X i − t Xi = E exp X i ti − t = exp
i=1 i=1  n i=1  i=1
  2 +  2 + 
n 2
μ ti − t + σ2 (ti − t ti − t + σ2
2
= exp μ ti − t or, equivalently,
i=1 i=1
as
 
σ2 .  2
n
M B (t) = exp ti − t , (5.4.21)
2 i=1

where t = (t1 , t2 , . . . , tn ). In short, from (5.4.19)–(5.4.21), the random variable


X n and the random vector B = (V1 , V2 , . . . , Vn ) are independent of each other.
Consequently, recollecting Theorem 4.1.3, the random variables X n and Wn =
n  2
1 +
n−1
X i − X n , a function of B = (V1 , V2 , . . . , Vn ), are independent of each
i=1
other. ♠

Theorem 5.4.5 is an important property of normal samples, and its converse is


known also to hold true: if the sample mean and sample variance of a sample are
independent of each other, then the sample is from a normal distribution. In the more
general case of samples from symmetric marginal distributions, the sample mean
and sample variance are known to be uncorrelated (Rohatgi and Saleh 2001).
5.4 Distributions of Statistics 377

5.4.2 Chi-Square Distribution

Let us now consider the distributions of some statistics of normal samples.

Definition 5.4.3 (central chi-square pdf) The pdf

1  r
 n r 2 −1 exp − u(r )
n
f (r ) = n (5.4.22)
2 Γ 2
2 2

is called the (central) chi-square pdf with its distribution denoted by χ2 (n), where n
is called the degree of freedom.

The central chi-square pdf (5.4.22), an exampleof which is shown in Fig. 5.4, is
α−1
the same as the gamma pdf f (r ) = β α Γ (α) r
1
exp − β u(r ) introduced in (2.5.31)
r
 
with α and β replaced by 2 and 2, respectively: in other words, χ2 (n) = G n2 , 2 .
n

Theorem 5.4.6 The square of a standard normal random variable is a χ2 (1) random
variable.

Theorem 5.4.6 is proved in Example 3.2.20. Based on Theorem 5.4.6, we have


 2  
n
σ2
X n − μ ∼ χ2 (1) for a sample of size n from N μ, σ 2 and, more generally,
 2    n
n
X n − μ ∼ χ2 (1) when X i ∼ N μi , σi2 i=1 are independent of each other.
σ2

Theorem 5.4.7 We have the mgf

1
MY (t) = (1 − 2t)− 2 , t <
n
(5.4.23)
2

and moments
 
 Γ k + n2
E Y k
= 2k
  (5.4.24)
Γ n2

for a random variable Y ∼ χ2 (n).

Fig. 5.4 A central f (r)


chi-square pdf n=6

0 r
378 5 Normal Random Vectors

Proof Using (5.4.22), the mgf MY (t) = E etY can be obtained as

∞  
1 n
−1 1 − 2t
MY (t) = n  x 2 exp − x d x. (5.4.25)
0 2 2 Γ n2 2

Now, letting y = x for t < 21 , we have x = 1−2t


1−2t 2y
and d x = 1−2t 2dy
. Thus,
2
   n2 −1

recollecting (1.4.65), we obtain MY (t) = n2 n 0 1−2t
1 2y
e−y 1−2t = (1 −
2dy
2 Γ(2)
 ∞
2t)− 2 Γ 1n 0 y 2 −1 e−y dy, which results in (5.4.23). The moments of Y
n n

(2) 
 k    
can easily be obtained as E Y k = dtd k MY (t) = (−2)k − n2 − n2 − 1 · · ·
 n     t=0
− 2 − (k − 1) = 2k n2 n2 + 1 · · · n2 + k − 1 , resulting in (5.4.24). ♠
Example 5.4.2 For Y ∼ χ2 (n), we have the expected value E{Y } = n and variance
Var(Y ) = 2n from (5.4.24). ♦
Example 5.4.3 (Rohatgi and Saleh 2001) For X n ∼ χ2 (n), obtain the limit distri-
butions of Yn = Xn 2n and Z n = Xnn .

Solution Recollecting that M X n (t) = (1 − 2t)− 2 , we have lim MYn (t) =


n

n→∞
t   n2 t
2t − 2t · n
t 
lim M X n n 2 = lim 1 − n 2 = lim exp n = 1 and, consequently,
n→∞ n→∞ n→∞
 − n
·t
P (Yn = 0) → 1. Similarly, we get lim M Z n (t) = lim 1 − 2tn 2t = et and,
n→∞ n→∞
consequently, P (Z n = 1) → 1. ♦
 n +
n
When X i ∼ χ2 (ki ) i=1
are independent of each other, the mgf of Sn = Xi
i=1
+
n
1
n

ki − 21 ki
can be obtained as M Sn (t) = (1 − 2t) 2 = (1 − 2t) i=1 based on the mgf
i=1
shown in (5.4.23) of X i . This result proves the following theorem:
 n
Theorem 5.4.8 When X i ∼ χ2 (ki ) i=1 are independent of each other, we have
 
+n +n    n
X i ∼ χ2 ki . In addition, if X i ∼ N μi , σi2 i=1 are independent of each
i=1 i=1
+n  2
X i −μi
other, then σi
∼ χ2 (n).
i=1

Definition 5.4.4  (non-central


 chi-square
 n distribution) For independent normal ran-
dom variables X i ∼ N μi , σ 2 i=1 with an identical variance σ 2 , the distribution
+n
X i2
of Y = σ2
is called the non-central chi-square distribution and is denoted by
i=1
+
n
μi2
χ (n, δ), where n and δ =
2
σ2
are called the degree of freedom and non-centrality
i=1
parameter, respectively.
5.4 Distributions of Statistics 379

The pdf of χ2 (n, δ) is


  ∞  
1 x + δ . (δx) j Γ j + 21
2 −1
n
f (x) = √ x exp −   u(x). (5.4.26)
2n π 2 j=0
(2 j)! Γ j + n2

  √
Recollecting Γ 21 = π shown in (1.4.83), it is easy to see that (5.4.26) with δ = 0
is the same as (5.4.22): in other words, χ2 (n, 0) is χ2 (n). In Exercise 5.32, it is shown
that

E{Y } = n + δ, (5.4.27)
σY2 = 2n + 4δ, (5.4.28)

and
 
δt 1
MY (t) = (1 − 2t)− 2 exp
n
, t< (5.4.29)
1 − 2t 2

are the mean, variance, and mgf, respectively, for Y ∼ χ2 (n, δ).
 n
Theorem 5.4.9 If X i ∼ χ2 (ki , δi ) i=1 are independent of each other, then Sn =
 
+n +
n +
n
X i ∼ χ2 ki , δi .
i=1 i=1 i=1

Proof From the mgf shown in (5.4.29) of X i , the mgf of Sn can be obtained as
  +n  
1n k
− 2i
− 21 ki
t +
n
M Sn (t) = (1 − 2t) exp 1−2t = (1 − 2t)
tδi i=1 exp 1−2t δi . In other
i=1   i=1
+
n +n +
n
words, X i ∼ χ2 ki , δi . ♠
i=1 i=1 i=1

Theorem 5.4.8 is a special case of Theorem 5.4.9.

5.4.3 t Distribution

Definition 5.4.5 (central t pdf) The pdf


   − n+1
Γ n+1 r2 2

f (r ) = n √
2
1+ (5.4.30)
Γ 2 nπ n

is called the central t pdf with the corresponding distribution denoted as t (n), where
the natural number n is called the degree of freedom.
380 5 Normal Random Vectors

Fig. 5.5 A central t pdf f (r)

0 r

The central t pdf with the degree of freedom of 1 is a Cauchy pdf: in other words,
t (1) = C(0, 1). Figure 5.5 shows an example of the central t pdf.
53 a
Example 5.4.4 When f (v) = is a pdf, obtain the value of a.
( 2 )3
5+v

∞ ∞  
2 −3
Solution From −∞ f (v)dv = 53 a −∞ 5 + v dv = 1, we get a = √8 .
3 5π
53 a 53 Γ (3)
Alternatively, comparing (5.4.30) and f (v) = , we have 53 a = √ , i.e.,
Γ ( 25 ) 5π
(5+v2 )3
a= 3√
2√
= √8 . ♦
4 π 5π 3 5π

Example 5.4.5 Obtain the limit of the central t pdf (5.4.30) as n → ∞.


Γ ( n+1
2 )

Solution From (1.4.77), we have lim √nπΓ = lim n2 √1nπ = √12π . In addition,
n→∞  ( n
2 ) n→∞
 − n+1   n − x22  2
x2 2 x 2 x2
lim 1 + n = lim 1 + n = exp − x2 . In short, the limit of the
n→∞ n→∞
central t pdf for n → ∞ is the standard normal pdf. ♦

Theorem 5.4.10 (Rohatgi and Saleh 2001) We have

√ X
n √ ∼ t (n) (5.4.31)
Y

when X ∼ N (0, 1) and Y ∼ χ2 (n) are independent of each other.

√ 
Proof Let T = n √XY and W = Y . Then, X = T Wn and the Jacobian of the
    
inverse transformation (X, Y ) = g −1 (T, W ) = T Wn , W is J g −1 (t, w) =
    w √1 √t  
 ∂ 
 ∂(t,w) g (t, w) =  n n 2 w  = wn . Thus, the joint pdf of (T, W ) can be
−1
0 1
 w  2  n −1  
obtained as f T,W (t, w) = 2πn exp − t2nw n2w 2 n exp − w2 u(w), i.e.,
2 Γ(2)

n−1   
w 2 w t2
f T,W (t, w) = √ n   exp − 1+ u(w). (5.4.32)
2πn2 2 Γ n2 2 n
5.4 Distributions of Statistics 381
   
w t2 2
Next, letting 2
1+= v, we have 1 + tn dw = 2dv. Thus, the pdf of T
n
∞   n+1
2 − 2 ∞ n−1
can be obtained as f T (t) = −∞ f T,W (t, w)dw = √πnΓ1 n 1 + tn 0 v
2
(2)
−v
e dv, or as
   − n+1
Γ n+1 t2 2

f T (t) =  n  √
2
1+ , (5.4.33)
Γ 2 nπ n

confirming the theorem. ♠



n  
The statistic T = √X n − μ , a function of the sample mean X n and the sample
Wn
variance Wn , is called the t statistic and is widely used in statistics. Now, based on
Theorem 5.4.10, let us consider a property of the t statistic from normal samples.
 
Theorem 5.4.11 When X is a sample of size n from X ∼ N μ, σ 2 , we have

n−1
Wn ∼ χ2 (n − 1) (5.4.34)
σ2

and

√ Xn − μ
n √ ∼ t (n − 1) (5.4.35)
Wn

for the sample mean X n and the sample variance Wn of X.

  n 
+  
Proof Recollecting that X i − X n = (X i − μ) − X n − μ and (X i − μ) X n − μ =
i=1
 2 n 
+ 2
n X n − μ , we can rewrite the sample variance as Wn = 1
n−1 Xi − X n =
  i=1 
+
n    2 +
n  2
1
n−1 (X i − μ)2 − 2 (X i − μ) X n − μ + X n − μ = 1
n−1 (X i − μ)2 − n X n − μ .
i=1 i=1
Thus, we have

.n  
Xi − μ 2 n−1 n  2
= Wn + 2 X n − μ . (5.4.36)
i=1
σ σ 2 σ

Now, we have
n  2
X n − μ ∼ χ2 (1) (5.4.37)
σ2

as observed in Theorem 5.4.6 and


382 5 Normal Random Vectors

.n  
Xi − μ 2
∼ χ2 (n) (5.4.38)
i=1
σ

 2
as observed in Theorem 5.4.8. Recollecting that n−1
σ2
Wn and σn2 X n − μ are inde-
pendent of each other as discussed in Theorem 5.4.5, the mgf of the statistic
+n  2
X i −μ
σ
in (5.4.36) can be obtained as
i=1

 
n   )    2 )
. Xi − μ 2 (n − 1)Wn n Xn − μ
E exp t = E exp t +t
i=1
σ σ2 σ2
     2 )
(n − 1)Wn n Xn − μ
= E exp t E exp t . (5.4.39)
σ2 σ2

Combining (5.4.37) and (5.4.38) into (5.4.39), we get


 
− n2 (n − 1)Wn
(1 − 2t)− 2
1
(1 − 2t) = E exp t (5.4.40)
σ2
 
or E exp t (n−1)W = (1 − 2t)− 2 , which implies
n−1
n
σ 2

n−1
Wn ∼ χ2 (n − 1). (5.4.41)
σ2

√ √
n ( X n −μ) − 21 √
Next, the distribution of n−1× σ
(n−1)Wn
σ2
= n X√nW−μ is t (n
n
− 1) from Theorem 5.4.10. ♠

  7( X 7 −1)
Example 5.4.6 For a sample from N 1, σ 2 , it is easy to see that T = √
W7

t (6) from Theorem 5.4.11. ♦

 5.4.6 (non-central t distribution) When the two random variables X ∼


Definition

N μ, σ 2 and σY2 ∼ χ2 (n) are independent of each other, the distribution of

√ X
Z = n√ (5.4.42)
Y

is called the non-central t distribution with the degree of freedom of n and non-
centrality parameter δ = σμ , and is denoted by t (n, δ).

The pdf of t (n, δ) is


5.4 Distributions of Statistics 383
 2  
n 2 exp − δ2 . ∞ Γ n+ j+1  j    2j
n

2 δ 2x 2
f (x) = √   n+1 n . (5.4.43)
π n + x 2 2 j=0 Γ 2 j! n + x2

Comparing the pdf’s (5.4.30) and (5.4.43), it is easy to see that the non-central t
distribution t (n, 0) is the same as the central t distribution t (n). In Exercise 5.34, we
obtain the mean
 
Γ n−1 n
E{Z } = δ n
2
, n>1 (5.4.44)
Γ 2 2

and variance
  n−1  2
n(1 + δ 2 ) nδ 2 Γ
Var{Z } = −  n2  , n>2 (5.4.45)
n−2 2 Γ 2

of Z ∼ t (n, δ).

Definition 5.4.7 (bi-variate t distribution) When the joint pdf of (X, Y ) is


  
1 1 x − μ1 2
f X,Y (x, y) =  1+  
2πσ1 σ2 1 − ρ2 n 1 − ρ2 σ1
  ) − n+2
2
(x − μ1 ) (y − μ2 ) y − μ2
2

−2ρ + , (5.4.46)
σ1 σ2 σ2

the random vector


 (X, Y ) is called  a bi-variate t random vector. The distribution,
denoted by t μ1 , μ2 , σ12 , σ22 , ρ, n , of (X, Y ) is called a bi-variate t distribution.
 
For (X, Y ) ∼ t μ1 , μ2 , σ12 , σ22 , ρ, n , we have the means E{X } = μ1 and E{Y } =
μ2 , variances Var{X } = n−2 n
σ12 and Var{Y } = n−2 n
σ22 for n > 2, and correlation coef-
ficient ρ. The parameter n determines how fast the pdf (5.4.7) decays to 0 as |x| → ∞
or as |y| → ∞. When n = 1, the bi-variate t pdf is the same as the bi-variate Cauchy
pdf, and the bi-variate t pdf converges to the bi-variate normal pdf as n gets larger.
ρsn
In addition, we have E{X |Y } = ρsY , E{X Y } = n−2 , and

  s2 2   3
E X 2 Y = 1 − ρ2 n + 1 + (n − 2)ρ2 Y 2 (5.4.47)
(n − 1)2
 
for (X, Y ) ∼ t 0, 0, s 2 , 1, ρ, n .
384 5 Normal Random Vectors

5.4.4 F Distribution

Definition 5.4.8 (central F pdf) The pdf


 
Γ m+n m  m  m2 −1  m − m+n
m  n
2
f (r ) = 2
r 1+ r u(r ) (5.4.48)
Γ 2 Γ 2 n n n

is called the central F pdf with the degree of freedom of (m, n) and its distribution
is denoted by F(m, n).
The F distribution, together with the chi-square and t distributions, plays an
important role in mathematical statistics. In Exercise 5.35, it is shown that the moment
of H ∼ F(m, n) is

  n k Γ  m + k  Γ  n − k 
E H k
= 2
  2 (5.4.49)
m Γ m2 Γ n2
9n:
for k = 1, 2, . . . , 2
− 1. Figure 5.6 shows the pdf of F(4, 3).
Theorem 5.4.12 (Rohatgi and Saleh 2001) We have

nX
∼ F(m, n) (5.4.50)
mY

when X ∼ χ2 (m) and Y ∼ χ2 (n) are independent of each other.

Proof Let H = mY nX
. Assuming the auxiliary variable V = Y , we have X =
m
H V and Y = V . Because the Jacobian of the inverse transformation  (X, Y ) =
n
       m  m
  v 0
g −1 (H, V ) = mn H V, V is J g −1 (r, v) =  ∂(r,v) ∂
g −1 (r, v) =  mn  = v, we
 n
r 1
  n
have the joint pdf f H,V (r, v) = mv f
n X,Y n
m
vr, v of (H, V ) as

mv  m 
f H,V (r, v) = fX vr f Y (v). (5.4.51)
n n

Fig. 5.6 The pdf of F(4, 3) f (r)


m = 4, n = 3

0 r
5.4 Distributions of Statistics 385

∞  ∞ mv m 
Now, the marginal pdf f H (r ) = −∞ f H,V (r, v)dv = −∞ n f X n vr f Y
 m   m −1  m   1  n2 n2 −1
∞ 1 2 m
n vr
2
vr v    
exp − v2 u(v)u m
2 2
(v)dv = mv
n −∞
m exp − n 2 n
n vr dv =
Γ 2 Γ 2
 m  n
1 2 1 2
 m  m −1 ∞ m+n   
2 −1 exp − v2 m
2
m     2
n Γ m Γ n n r 2 0 v n r + 1 dv u(r ) of H can be obtained
2 2
as10

m 1  m  m2 −1 ∞  1  m+n
2
v 2 −1
m+n
f H (r ) = m  n r
n Γ 2 Γ 2 n 0 2
v m 
× exp − r + 1 dv u(r )
 m+n  2 n  m 
Γ m m 2 −1 m − m+n
= m  2 n
2
r 1+ r u(r ) (5.4.52)
Γ 2 Γ 2 n n n

 1  m2   m  −1
x 2 −1 e− 2 u(x)
m x
by noting that f X (x) = Γ 2 and f Y (y) =
 1  n2   n  −1 − 2y
2
2 −1
n

2
Γ 2 y e u(y). ♠
Example 5.4.7 Show that ∼ F(n, m) when X ∼ F(m, n).
1
X
 − m+n   m2 −1
Solution If we obtain the pdf f Y (y) = y12 (Γn )m (Γ 2n ) 1 + mn 1y
m
Γ m+n 2 m 1
( 2 ) (2) n y
      m+n
 
u 1y = y12 (Γn )m (Γ 2n ) ny u(y) = Γ m( Γ2 )n mn 2 y − 2 −1 (ny) 2
m
Γ m+n m
1− 2 ny 2 Γ m+n m m m+n

( 2 ) (2) m ny+m ( 2 ) (2)


(ny + m)− 2 u(y) of Y = X1 based on (3.2.27) and (5.4.48), we have
m+n

 m n  − m+n
f Y (y) = Γ m( Γ2 )n mn 2 y 2 −1 y + mn
Γ m+n 2
u(y), i.e.,
( 2 ) (2)

   
Γ m+n n n  n2 −1  n − m+n
f Y (y) =  m   n 
2
2
y 1+ y u(y) (5.4.53)
Γ 2 Γ 2 m m m

 
by noting that u 1
y
= u(y). ♦

Example 5.4.8 When n → ∞, find the limit of the pdf of F(m, n).
Solution Let us first rewrite the pdf f F (x) as
 
1 1 Γ m+n mx  m2  mx − m+n
m  n
2
f F (x) = 2
1+ u(x). (5.4.54)
Γ 2 x Γ 2 n n
; <= >
A

  m+n  −1  ∞  1  m+n m+n w 


10Here, note that Γ 2
w 2 −1 e− 2 dw = 1 when we let w = v 1 +
m
 2 0 2
nr .
386 5 Normal Random Vectors

 n  m2  mx  m2  mx  m2
Using (1.4.77), we have lim A = lim 2 n
= 2
and
n→∞ n→∞

 m − (m+n)
2
 m − n2  m − m2
lim 1 + x = lim 1 + x 1+ x
n→∞ n n→∞
 m n n
= exp − x . (5.4.55)
2

Thus, letting a = m
2
, we get

a
lim f F (x) = (ax)a−1 exp(−ax)u(x). (5.4.56)
n→∞ Γ (a)
 
In other words, when n → ∞, F(m, n) → G m2 , m2 , where G(α, β) denotes the
gamma distribution described by the pdf (2.5.31). ♦
Example 5.4.9 In Example 5.4.8, we obtained the limit of the pdf of F(m, n) when
n → ∞. Now, obtain the limit when m → ∞. Then, based on the result, when
X ∼ F(m, n) and m → ∞, obtain the pdf of X1 .
Solution Rewrite the pdf f F (x) as

   n2   m2
1 1 Γ m+n n mx
f F (x) = n m 
2
u(x). (5.4.57)
Γ 2 x Γ 2 n + mx n + mx
; <= >; <= >
B C

Then, if we let b = n2 , we get

 b+1  
1 b b
lim f F (x) = exp − u(x) (5.4.58)
m→∞ bΓ (b) x x

n
 n  n2  n  n2
noting that lim B = lim ( 2 Γ) m ( 2 ) × mx
m 2
Γ m
= 2x from (1.4.77) and that
m→∞ m→∞ (2)
 
n −2
m  
lim C = lim 1 + mx = exp − 2x n
. Figure 5.7 shows the pdf of F(m, 10)
m→∞ m→∞

 n) for m → ∞.
for some values of m, and Fig. 5.8 shows three pdf’s11 of F(m,
Next, for m → ∞, the pdf lim f X1 (y) = lim y 2 f F 1y = bΓ1(b) y12 (by)b+1
1
  m→∞ m→∞
exp(−by)u 1y of X1 can be obtained as

b
lim f X1 (y) = (by)b−1 exp(−by)u(y). (5.4.59)
m→∞ Γ (b)

11 The maximum is at x = a−1


a = m−2
m in Fig. 5.7 and at x = b
b+1 = n
n+2 in Fig. 5.8.
5.4 Distributions of Statistics 387

fF (x)

0.8
m → ∞, n = 10
0.6 m = 10, n = 10
m = 20, n = 10
0.4 m = 100, n = 10

0.2

x
0 1 2 3 4 5

Fig. 5.7 The pdf f F(m,10) (x) for some values of m

lim fF (m,n) (x)


m→∞

2.5

2
n = 0.5
1.5 n = 10
n = 100
1

0.5

0 1 2 3 4 x

Fig. 5.8 The limit lim f F(m,n) (x) of the pdf of F(m, n)
m→∞

n 
In other words, 1
X
∼G 2
, n2 when m → ∞. ♦

Because X1 ∼ F(n, m) when X ∼ F(m, n) as we have observed in (5.4.53) and


 
F(m, n) → G m2 , m2 for n → ∞ as we have observed in Example 5.4.8, we have
 
1
∼ G n2 , n2 for m → ∞ when X ∼ F(m, n): Example 5.4.9 shows this result
X  
n
directly. In addition, based on this result and (3.2.26), we have 2X ∼ G n2 , 1 for
m → ∞ when X ∼ F(m, n).

Theorem
 5.4.13
 (Rohatgi and Saleh 2001)  If X =  (X 1 , X 2 , . . . , X m ) from
N μ X , σ 2X and Y = (Y1 , Y2 , . . . , Yn ) from N μY , σY2 are independent of each
other, then

σY2 W X,m
∼ F(m − 1, n − 1) (5.4.60)
σ 2X WY,m

and
388 5 Normal Random Vectors
   ?
X m − μ X − Y n − μY m+n−2
 ∼ t (m + n − 2), (5.4.61)
(m−1)W X,m (n−1)WY,m σ 2X σY2
σ 2 + σ 2 m
+ n
X Y

where X m and W X,m are the sample mean and sample variance of X, respectively,
and Y n and WY,n are the sample mean and sample variance of Y , respectively.
(m−1) (n−1)
Proof From (5.4.41), we have σ 2X
W X,m ∼ χ2 (m − 1) and σY2
WY,n ∼ χ2 (n −
1).Thus, (5.4.60) follows from Theorem 5.4.12. Next, noting that X m − Y n ∼
σ 2X σY2
N μ X − μY , m + n , σ2 W X,m + (n−1)
(m−1)
σY2
WY,n ∼ χ2 (m + n − 2), and these two
X
statistics are independent of each other, we easily get (5.4.61) from Theorem 5.4.10.

Definition 5.4.9 (non-central F distribution) When X ∼ χ2 (m, δ) and Y ∼ χ2 (n)


are independent of each other, the distribution of

nX
H = (5.4.62)
mY

is called the non-central F distribution with the degree of freedom of (m, n) and
non-centrality parameter δ, and is denoted by F(m, n, δ).

The pdf of the non-central F distribution F(m, n, δ) is


 δmx  j  m+n+2 j 
m 2 n 2 x 2 −1 .
m n m ∞
2
Γ 2
f (x) = n δ   u(x). (5.4.63)
Γ 2 exp 2 j=0 j! Γ m+2 j (mx + n) m+n+2 2
j

Here, the pdf (5.4.63) for δ = 0 indicates that F(m, n, 0) is the central F distribution
F(m, n). In Exercise 5.36, we obtain the mean

n(m + δ)
E{H } = (5.4.64)
m(n − 2)

for n = 3, 4, . . . and variance



2n 2 (m + δ)2 + (n − 2)(m + 2δ)
Var{H } = (5.4.65)
m 2 (n − 4)(n − 2)2

for n = 5, 6, . . . of F(m, n, δ).


Appendices 389

Appendices

Appendix 5.1 Proof of General Formula for Joint Moments

The general formula (5.3.51) for the joint moments of normal random vectors is
proved via mathematical induction here.
First note that
⎧ ⎫
 n  ⎪
⎪ ⎪
∂ * a ⎨
a −1 a −1
* a⎪
n ⎬
E X k k = ai a j E X i i X j j
Xkk (5.A.1)
∂ ρ̃i j ⎪
⎪ ⎪

k=1 ⎩ k=1 ⎭
k=i, j

for i = j and
⎧ ⎫
  ⎪
⎪ ⎪

∂ *n
1 *⎨
n ⎬
E X kak = ai (ai − 1) E X iai −2 X kak (5.A.2)
∂ ρ̃ii 2 ⎪
⎪ ⎪

k=1 ⎩ k=1 ⎭
k=i

for i = j from (5.3.12).



(1) Let us show that (5.3.22) holds true. Express E X 1a X 2b as

a− j b− j
 . ..
min(a,b)
j p q a− j−2 p b− j−2q
E X 1a X 2b = da,b, j, p,q ρ̃12 ρ̃11 ρ̃22 m 1 m2 . (5.A.3)
j=0 p=0 q=0

  
Then, when ρ̃12 = 0, we have E X 1a X 2b = E X 1a E X 2b , i.e.,

.
a
.
b
p q a−2 p b−2q
da,b,0, p,q ρ̃11 ρ̃22 m 1 m2
p=0 q=0

.
a
p a−2 p
.
b
q b−2q
= da, p ρ̃11 m1 db,q ρ̃22 m 2 (5.A.4)
p=0 q=0

p a−2 p 
because the coefficient of ρ̃11 m 1 is da, p = 2 p p!(a−2
a!
p)!
when E X 1a is
expanded as we can see from (5.3.17). Thus, we get

a!b!
da,b,0, p,q = . (5.A.5)
2 p+q p!q!(a − 2 p)!(b − 2q)!
390 5 Normal Random Vectors

Next, from (5.A.1), we get

∂  
E X 1a X 2b = abE X 1a−1 X 2b−1 . (5.A.6)
∂ ρ̃12

The left- and right-hand sides of (5.A.6) can be expressed as


 + a−
min(a,b) +j b−
+j j−1 p q
∂ ρ̃12
E X 1a X 2b = j da,b, j, p,q ρ̃12 ρ̃11 ρ̃22
j=1 p=0 q=0

+ a−1−
min(a,b)−1 + j b−1−
+j a−1− j−2 p b−1− j−2q
= ( j + 1)da,b, j+1, p,q m 1 m2 (5.A.7)
j=0 p=0 q=0

and
a−1− j b−1− j
 .
min(a−1,b−1) . . j p q
E X 1a−1 X 2b−1 = da−1,b−1, j, p,q ρ̃12 ρ̃11 ρ̃22
j=0 p=0 q=0
a−1− j−2 p b−1− j−2q
×m 1 m2 , (5.A.8)

respectively, using (5.A.3). Taking into consideration that min(a, b) − 1 =


min(a − 1, b − 1), we get

ab
da,b, j+1, p,q = da−1,b−1, j, p,q (5.A.9)
j +1

from (5.A.6)–(5.A.8). Using (5.A.5) in (5.A.9) recursively, we obtain

a!b!
da,b, j+1, p,q = ,
2 p+q p!q!( j + 1)!(a − j − 1 − 2 p)!(b − j − 1 − 2q)!
(5.A.10)

which is equivalent to the coefficient shown in (5.3.22).


(2) We have so far shown that (5.3.51) holds true when n = 2, and that (5.3.51)
holds true when n = 1 as shown in Exercise 5.27. Now, assume that (5.3.51)
holds true when n = m − 1. Then, because
  am−1 
E X 1a1 X 2a2 · · · X mam = E X 1a1 X 2a2 · · · X m−1 E X mam (5.A.11)
2 3
when ρ̃1m = ρ̃2m = · · · = ρ̃m−1,m = 0, we get da2 ,l 2 l1m →0,l2m →0,...,lm−1,m →0
= da1 ,l 1 dam ,lmm , i.e.,
Appendices 391
2 3
da2 ,l 2 l1m →0,l2m →0,...,lm−1,m →0 = da1 ,l 1 dam ,lmm
1
m
ak !
k=1
= (5.A.12)
2 Ml 2 ζm−1 (l) lmm ! (am − 2lmm )!η1,m−1

+
n
from (5.3.51) with n = m − 1 and (5.3.17), where Ml = lii , a1 = {ai }i=1
m−1
, a2 =
i=1
 m−1 m−1 1
m 1
m
a1 ∪ {am }, l 1 = li j j=i i=1
, l2 = l1 ∪ {lim }i=1
m
, ζm (l) = li j !, and ηk,m =
i=1 j=i
1
m
L ak , j !. Here, the symbol → denotes a substitution: for example, α → β means
j=1
the substitution of α with β.
Next, employing (5.A.1) with (i, j) = (1, m), (2, m), . . . , (m − 1, m), we will
get
2 3
2 3 ai !am ! da2 ,l 2 lim →0,ai →ai −lim −1,am →am −lim −1
da2 ,l 2 lim →lim +1 = (5.A.13)
(ai − lim − 1)! (am − lim − 1)! (lim + 1)!

for i = 1, 2, . . . , m − 1 in a fashion similar to that leading to (5.A.10) from (5.A.6).


By changing lim + 1 into lim in (5.A.13), we have
2 3
ai !am ! da2 ,l 2 lim →0,ai →ai −lim ,am →am −lim
da2 ,l 2 = (5.A.14)
(ai − lim )! (am − lim )!lim !

for i = 1, 2, . . . , m − 1. Now, letting l2m = 0 in (5.A.14) with i = 1, we get


2 3
2 3 a1 !am ! da2 ,l 2 l1m →0,l2m →0,a1 →a1 −l1m ,am →am −l1m
da2 ,l 2 l2m →0 = . (5.A.15)
(a1 − l1m )! (am − l1m )!l1m !

Using (5.A.15) into (5.A.14) with i = 2, we obtain

a2 !am ! a1 ! (am − l2m )!


da2 ,l 2 = ×
(a2 − l2m )! (am − l2m )!l2m ! (a1 − l1m )! (am − l1m − l2m )!l1m !
2 3
× da2 ,l 2 l1m →0,l2m →0,a1 →a1 −l1m ,a2 →a2 −l2m ,am →am −l1m −l2m . (5.A.16)

Subsequently, letting l3m = 0 in (5.A.16), we get

2 3 a1 !a2 !am !
da2 ,l 2 l3m →0 =
l1m !l2m ! (a1 − l1m )! (a2 − l2m )! (am − l1m − l2m )!
2 3
× da2 ,l 2 lkm →0 for k=1,2,3, a1 →a1 −l1m , a2 →a2 −l2m , am →am −l1m −l2m , (5.A.17)
392 5 Normal Random Vectors

which can be employed into (5.A.14) with i = 3 to produce da2 ,l 2 =


a3 !am ! 1 a1 !a2 ! (am − l3m )!
(a − l
2 3 3 3m 3m m )!l ! (a − l 3m )! l !l
1m 2m ! (a 1 − l 1m (a2 − l2m )! (am − l1m − l2m − l3m )!
)!
da2 ,l 2 lkm →0, ak →ak −lkm for k=1,2,3; a →a −l −l −l , i.e.,
m m 1m 2m 3m

 
1
3
ak ! am !
k=1
da2 ,l 2 =    
1
3 1
3 +
3
lkm ! (ak − lkm )! am − lkm !
k=1 k=1 k=1
2 3
× da2 ,l 2 +
3 . (5.A.18)
lkm →0, ak →ak −lkm for k=1,2,3; am →am − lkm
k=1

2 3
If we repeat the steps above until we reach i = m − 1, using da2 ,l 2 lm−1,m →0
obtained by letting lm−1,m = 0 in (5.A.18) with i = m − 2 and recollecting (5.A.14)
with i = m − 1, we will eventually get
 m−1
1
ak ! am !
k=1
da2 ,l 2 = m−1  m−1  
1 1 +
m−1
lkm ! (ak − lkm )! am − lkm !
k=1 k=1 k=1
2 3
× da2 ,l 2 +
m−1 . (5.A.19)
lkm →0, ak →ak −lkm for k=1,2,...,m−1; am →am − lkm
k=1

+
m−1 +
m
Finally, noting that am − lkm − 2lmm = am − lmm − lkm = L a2 ,m and
k=1 k=1
2 3 +
m−1 +
m
that L a1 , j ak →ak −lkm for k=1,2,...,m−1
= a j − l jm − l j j − lk j = a j − l j j − lk j = L a2 , j
k=1 k=1
for j = 1, 2, . . . , m − 1, if we combine (5.A.12) and (5.A.19), we can get
 m−1
1
ak ! am !
k=1
da2 ,l 2 = m−1  m−1  
1 1 +
m−1
lkm ! (ak − lkm )! am − lkm !
k=1 k=1 k=1
m−1  
1 +
m−1
(ak − lkm )! am − lkm !
k=1 k=1
×  
+
m−1
2 ζm−1 (l) lmm ! am −
Ml 2
lkm − 2lmm !D2,2
k=1
1
m
ak !
k=1
= , (5.A.20)
2 Ml 2 ζm (l) η2,m (l)
Appendices 393

1
m−1
which implies that (5.3.51) holds true also when n = m, where D2,2 =
2 3  j=1

L a1 , j ak →ak −lkm for k=1,2,...,m−1 !.

Appendix 5.2 Some Integral Formulas

For the quadratic function

.
n .
n
Q(x) = ai j xi x j (5.A.21)
j=1 i=1

of x = (x1 , x2 , . . . , xn ), consider
∞ ∞ ∞
Jn = ··· exp{−Q(x)} d x, (5.A.22)
0 0 0

where d x = d x1 d x2 · · · d xn . When n = 1 with Q(x) = a11 x12 , we easily get



π
J1 = √ (5.A.23)
2 a11

for a11 > 0. When n = 2, assume Q(x) = a11 x12 + a22 x22 + 2a12 x1 x2 , where Δ2 =
a11 a22 − a12
2
> 0. We then get
 
1 π a12
J2 = √ − tan−1 √ . (5.A.24)
Δ2 2 Δ2

In addition, when n = 3 assume Q(x) = a11 x12 + a22 x22 + a33 x32 + 2a12 x1 x2 +
2a23 x2 x3 + 2a31 x3 x1 , where Δ3 = a11 a22 a33 − a11 a23
2
− a22 a31
2
− a33 a12
2
+
2a12 a23 a31 > 0 and {aii > 0}i=1 . Then, we will get
3

√ / 0
π . −1 ai j aki − aii a jk
c
π
J3 = √ + tan √ (5.A.25)
4 Δ3 2 aii Δ3

+c
after some manipulations, where denotes the cyclic sum defined in (5.3.44).
Now, recollect the standard normal pdf
 2
1 x
φ(x) = √ exp − (5.A.26)
2π 2
394 5 Normal Random Vectors

and the standard normal cdf


x
Φ(x) = φ(t) dt (5.A.27)
−∞

∞
defined in (3.5.2) and (3.5.3), respectively. Based on (5.A.23) or on −∞ exp
   ∞ ∞
−αx 2 d x = απ shown in (3.3.28), we get −∞ φm (x)d x = 1 m2 −∞ exp
 2
 (2π)
− mx2 d x, i.e.,


φm (x) d x = (2π)− m− 2 .
m−1 1
2 (5.A.28)
−∞

For n = 0, 1, . . ., consider

In (a) = 2π Φ n (ax) φ2 (x) d x
−∞
∞  
= Φ n (ax) exp −x 2 d x. (5.A.29)
−∞

Letting n = 0 in (5.A.29), we have



I0 (a) = π, (5.A.30)

and letting a = 0 in (5.A.29), we have



π
In (0) = . (5.A.31)
2n

Because 2m + 1 is an odd number for m = 0, 1, . . ., we have

∞ 
1 2m+1  
Φ(ax) − exp −x 2 d x = 0, (5.A.32)
−∞ 2

+ 
2m+1 i
which can subsequently be expressed as − 21 2m+1 Ci I2m+1−i (a) = 0 from
i=0
 1 2m+1 + 
2m+1 i
(5.A.29) and then Φ(ax) − 2
= − 21 2m+1 Ci Φ
2m+1−i
(ax). This result
i=0
in turn can be rewritten as
Appendices 395

.
2m+1
I2m+1 (a) = 2−i (−1)i+1 2m+1 Ci I2m+1−i (a) (5.A.33)
i=1

for m = 0, 1, . . . after some steps. Thus, when m = 0, from (5.A.30) and (5.A.33),
we get I1 (a) = 21 I0 (a), i.e.,

π
I1 (a) = . (5.A.34)
2

Similarly, when m = 1, from (5.A.30), (5.A.33), and (5.A.34), we get I3 (a) =


3
I (a) − 43 I1 (a) + 18 I0 (a), i.e.,
2 2

3 1
I3 (a) = I2 (a) − I0 (a). (5.A.35)
2 4

d
Next, recollecting that da Φ(ax) = xφ(ax) and da d
Φ 2 (ax) = 2xΦ(ax)φ(ax),
if we differentiate I2 (a) with respect to a using Leibnitz’s  ∞rule (3.2.18), inte-
grate by parts, and then use (3.3.29), we get da d
I2 (a) = 2π −∞ 2xΦ(ax)φ(ax)φ2
∞      ∞
2 Φ(ax) 2+a 2 2
(x) = √2 −∞ Φ(ax)x exp − 2+a 2 x 2 dx = 2
π − 2+a 2 exp − 2 x +

2π    2
x=−∞
 ∞ 2
2+a x  ∞   
π 2+a 2 −∞ φ(ax) exp − d x = π 2+a exp − 1 + a 2 x 2 d x , i.e.,
2 a a
2 ( 2 ) −∞

d a π
I2 (a) =   . (5.A.36)
da π 2 + a2 1 + a2


Consequently, noting (5.A.31) and d
da
tan−1 1 + a2 = a√
(2+a 2 ) 1+a 2
from
−1
d
dx
tan x= 1
1+x 2
, we finally obtain

1 
I2 (a) = √ tan−1 1 + a 2 , (5.A.37)
π

and then,
 √
3 −1 π
I3 (a) = √ tan 1+a −
2 (5.A.38)
2 π 4

from (5.A.35) and (5.A.37). The results {Jk }3k=1 and {Ik (a)}3k=0 we have derived so
far, together with φ (x) = −xφ(x), are quite useful in obtaining the moments of
order statistics of standard normal distribution for small values of n.
396 5 Normal Random Vectors

Appendix 5.3 Generalized Gaussian, Generalized Cauchy,


and Stable Distributions

In many fields including signal processing, communication, and control, it is usually


assumed that noise is a normal random variable. The rationale for this is as fol-
lows: the first reason is due to the central limit theorem, which will be discussed in
Chap. 6. According to the central limit theorem, the sum of random variables will
converge to a normal random variable under certain conditions and the sum can rea-
sonably be approximated by a normal random variable even when the conditions are
not satisfied perfectly. We have already observed such a case in Gaussian approxima-
tion of binomial distribution in Theorem 3.5.16. In essence, the mathematical model
of Gaussian assumption on noise does not deviate much from reality. The second
reason is that, if we assume that noise is Gaussian, many schemes of communications
and signal processing can be obtained in a simple way and analysis of such schemes
becomes relatively easy.
On the other hand, in some real environments, noise can be described only by
non-Gaussian distributions. For example, it is reported that the low frequency noise
in the atmosphere and noise environment in underwater acoustics can be modeled
adequately only with non-Gaussian distributions. When noise is non-Gaussian, it
would be necessary to adopt an adequate model other than the Gaussian model
for the real environment in finding, for instance, signal processing techniques or
communication schemes. Needless to say, in such an environment, we could still
apply techniques obtained under Gaussian assumption on noise at the cost of some,
and sometimes significant, loss and/or unpredictability in the performance.
Among the non-Gaussian distributions, impulsive distributions, also called long-
tailed or heavy-tailed distributions also, constitute an important class. In general,
when the tail of the pdf of a distribution is heavier (longer) than that of a normal
distribution, the distribution is called an impulsive distribution. In impulsive dis-
tributions, noise of very large magnitude or absolute value (that is, values much
larger or smaller than the median) can occur more frequently than that in the normal
distribution.
Let us here discuss in a brief manner the generalized Gaussian distribution and the
generalized Cauchy distribution. In addition, the class of stable distributions (Nikias
and Shao 1995; Tsihrintzis and Nikias 1995), which has bases on the generalized
central limit theorem, will also be introduced. In passing, the generalized central limit
theorem, not covered in this book, allows us to consider the convergence of random
variables of which the variance is not finite and, therefore, to which the central limit
theorem cannot be applied.
Appendices 397

fGG (x)

2.5

2 σG = 1
k = 0.5
1.5 k=1
k=2
1
k = 10
0.5 k=∞

0 x
−3 −2 −1 0 1 2 3

Fig. 5.9 The generalized normal pdf

(A) Generalized Gaussian Distribution

Definition 5.A.1 (generalized normal distribution) A distribution with the pdf


   )
k |x| k
f GG (x) =   exp − (5.A.39)
2 A G (k)Γ k1 A G (k)

generalized normal or generalized Gaussian distribution, where k > 0 and


is called a 
σG2 Γ ( k1 )
A G (k) = Γ ( k3 )
.

As it is also clear in Fig. 5.9, the pdf of the generalized normal distribution is
a unimodal even function, defined by two parameters. The two parameters are the
variance σG2 and the rate k of decay of the pdf.
The generalized normal pdf is usefully employed in representing many pdf’s by
adopting appropriate values of k. For example, when k = 2, the generalized normal
pdf is a normal pdf. When k < 2, the generalized normal pdf is an impulsive pdf:
specifically, when k = 1, the generalized normal pdf is the double exponential pdf
0 / √
1 2|x|
f D (x) = √ exp − . (5.A.40)
2σG σG

The moment of a random variable X with the pdf f GG (x) in (5.A.39) is


1  
2 −1 Γ r +1
r
 Γ
E X r
= σGr k
r 3 k (5.A.41)
Γ 2
k
398 5 Normal Random Vectors
3
when r is even. In addition, recollecting (1.4.76), we have lim k3 Γ k
= 1 and
  k→∞
lim k1 Γ k1 = 1, and therefore
k→∞


lim A G (k) = 3σG (5.A.42)
k→∞

and lim k
= → ∞, the limit of the exponential function
√1 . Next, for k
k→∞ 2 A G (k)Γ ( k )
1
2 3σG

in (5.A.39) is 1 when |x| ≤ A G (k), or equivalently when |x| ≤ 3σG , and 0 when
|x| > A G (k). Therefore, for k → ∞, we have

1 √ 
f GG (x) → √ u 3σG − |x| . (5.A.43)
2 3σG

In other words, for k → ∞, the limit of the generalized normal pdf is a uniform pdf
as shown in Fig. 5.9.

(B) Generalized Cauchy Distribution

Definition 5.A.2 (generalized Cauchy distribution) A distribution with the pdf

B̃c (k, v)
f GC (x) = (5.A.44)
v+ k1
D̃c (x)

is called the generalized Cauchy distribution and is denoted by G C (k, v). Here, k > 0,
v > 0, B̃c (k, v) = 1 ( k ) 1 , and D̃c (x) = 1 + v1 AG|x|(k) .
kΓ v+ 1 k

2v k A G (k)Γ (v)Γ ( k )

Figure 5.10 shows the generalized Cauchy pdf. When the parameter v is finite,
the tail of the generalized Cauchy pdf shows not an exponential behavior, but an
algebraic behavior. Specifically, when |x| is large, the tail of the generalized Cauchy
pdf f GC (x) decreases in proportion to |x|−(kv+1) .
When k = 2 and 2v is an integer, the generalized Cauchy pdf is a t pdf, and when
k = 2 and v = 21 , the generalized Cauchy pdf is a Cauchy pdf

σG
f C (x) =  . (5.A.45)
π x2+ σG2

v+ k1
When the parameters σG2 and k are fixed, we have lim D̃c (x) = lim
v→∞ v→∞
 1
k v+ k
1 + v1 AG|x|(k) , i.e.,
Appendices 399

fGC (x)

2.5 k = 0.5
v = 10
2 σG = 1
k=1
1.5 k=2
1 k = 10
0.5 k=∞

0 x
−3 −2 −1 0 1 2 3

Fig. 5.10 The generalized Cauchy pdf

 k )
v+ 1 |x|
lim D̃c k (x) = exp . (5.A.46)
v→∞ A G (k)

lim ( k )
Γ v+ 1
In addition, lim B̃c (k, v) = k
2 A G (k)Γ ( k1 ) v→∞ v k1 Γ (v)
= k
2 A G (k)Γ ( k1 )
because
v→∞
Γ (v+ k1 )
lim 1 = 1 from (1.4.77). Thus, for v → ∞, the generalized Cauchy
v→∞ v k Γ (v)
pdf converges to the generalized normal pdf. For example, when k = 2 and v → ∞,
the generalized Cauchy pdf is a normal pdf.
Next, using lim pΓ ( p) = lim Γ ( p + 1) = 1 shown in (1.4.76)

p→0 p→0
and lim A G (k) = 3σG shown in (5.A.42), we get lim B̃c (k, v) =
k→∞ k→∞
Γ (v)
lim = when v is fixed. In addition, lim D̃c (x) = 1
√1
√k ( k )
k→∞ 2 A G (k)Γ (v)
1
Γ 1 2 3σG
√ k→∞
when |x| < 3σG and lim D̃c (x) = ∞ when |x| > 3σG . In short, when v is
k→∞
fixed and k → ∞, we have

1 √ 
f GC (x) → √ u 3σG − |x| , (5.A.47)
2 3σG

i.e., the limit of the generalized Cauchy pdf is a uniform pdf as shown also in
Fig. 5.10.
After some steps, we can obtain the r -th moment
    1
Γ v − rk Γ r +1 2 −1
r
 r Γ
= v σGr r 3
r k k
E X k (5.A.48)
Γ (v)Γ 2 k
400 5 Normal Random Vectors

for vk > r when r is an even number, and the variance


 
2 Γ v − 2k
σGC
2
= σG2 v k (5.A.49)
Γ (v)

of the generalized Cauchy pdf.

(C) Stable Distribution

The class of stable distributions is also a useful class for modeling impulsive envi-
ronments for a variety of scenarios. Unlike the generalized Gaussian and generalized
Cauchy distributions, the stable distributions are defined by their cf’s.

Definition 5.A.3 (stable distribution) A distribution with the cf


2 3
ϕ(t) = exp jmt − γ|t|α {1 + jβsgn(t)ω(t, α)} (5.A.50)

is called a stable distribution. Here, 0 < α ≤ 2, |β| ≤ 1, m is a real number, γ > 0,


and

tan απ , if α = 1,
ω(t, α) = 2 2 (5.A.51)
π
log |t|, if α = 1.

In Definition 5.A.3, the numbers m, α, β, and γ are called the location parameter,
characteristic exponent, symmetry parameter, and dispersion parameter, respectively.
The location parameter m represents the mean when 1 < α ≤ 2 and the median when
0 < α ≤ 1. The characteristic exponent α represents the weight or length of the tail of
the pdf, with a smaller value denoting a longer tail or a higher degree of impulsiveness.
The symmetry parameter β determines the symmetry of the pdf with β = 0 resulting
in a symmetric pdf. The dispersion parameter γ plays a role similar to the variance
of a normal distribution. For instance, the stable distribution is a normal distribution
and the variance is 2γ when α = 2. When α = 1 and β = 0, the stable distribution
is a Cauchy distribution.

Definition 5.A.4 (symmetric alpha-stable distribution) When the symmetry param-


eter β = 0, the stable distribution is called the symmetric α-stable (SαS) distribution.
When the location parameter m = 0 and the dispersion parameter γ = 1, the SαS
distribution is called the standard SαS distribution.

By inverse transforming the cf

ϕ(t) = exp (−γ|t|α ) (5.A.52)


Appendices 401

f (x)
α = 0.6

α = 1.0

α = 1.4
α = 2.0

m x

Fig. 5.11 The pdf of SαS distribution

of the SαS distribution with m = 0, we have the pdf



⎪ 1 + (−1)k−1
∞  kαπ   |x| −αk−1

⎪ Γ (αk + 1) sin ,


1
πγ α k=1 k! 2 γα
1


for 0 < α ≤ 1,
f (x) = +∞  2k+1   x 2k (5.A.53)

⎪ 1 (−1)k
Γ α ,


⎪ παγ α k=0 (2k)!
1 1
γα


for 1 ≤ α ≤ 2.

It is known that the pdf (5.A.53) can be expressed more explicitly in a closed form
when α = 1 and 2. Figure 5.11 shows pdf’s of the SαS distributions.
Let us show that the two infinite series in (5.A.53) become the Cauchy pdf
γ
f (x) =   (5.A.54)
π x2 + γ2

when α = 1, and that the second infinite series of (5.A.53) is the normal pdf
 
1 x2
f (x) = √ exp − (5.A.55)
2 πγ 4γ

when α = 2. The first infinite series in (5.A.53) forα = 1 can be expressed


1 + (−1)k−1
∞    |x| −k−1  2  4  6
γ γ γ
as πγ k!
Γ (k + 1) sin kπ2 γ
= πγ
1
|x|
− |x| + |x|
k=1 
 8  2 k+1
1 +

γ
− |x| + · · · = πγ (−1)k γx 2 , i.e.,
k=0

∞    −k−1
1 . (−1)k−1 kπ |x| γ
Γ (k + 1) sin =  2  , (5.A.56)
πγ k=1 k! 2 γ π x + γ2
402 5 Normal Random Vectors

which can also be obtained from the second infinite series of (5.A.53) as
 2k  2k
1 + (−1)k 1 +
∞ ∞
Γ (2k + 1) γx = πγ (−1)k γx = π x 2γ+γ 2 . Next, noting that
πγ
k=0
(2k)!
k=0
( )
 2k+1  (2k)! √ +

(−x) k
Γ 2
= 22k k! π shown in (1.4.84) and that k!
= e−x , the second infinite
k=0
+∞
(−1)k
 2k+1   x 2k
series of (5.A.53) for α = 2 can be rewritten as 2π1√γ (2k)!
Γ 2

γ
=
k=0
+∞  k +
∞  k
(−1)k x2 x2
√1
2 πγ 22k k! γ
= 2√1πγ 1
k!
− 4γ , i.e.,
k=0 k=0

∞     
1 . (−1)k 2k + 1 x 2k 1 x2
√ Γ √ = √ exp − . (5.A.57)
2π γ (2k)! 2 γ 2 πγ 4γ
k=0

 3
When A ∼ U − π2 , π2 and an exponential random variable B with mean 1 are
independent of each other, it is known that
 1−α
sin(α A) cos{(1 − α)A} α
X = 1 (5.A.58)
(cos A) α B

is a standard SαS random variable. This result is useful when generating random
numbers obeying the SαS distribution.

Definition 5.A.5 (bi-variate isotropic SαS distribution) When the joint pdf of a
random vector (X, Y ) can be expressed as
∞ ∞  α
1
f X,Y (x, y) = 2
exp −γ ω12 + ω22 2
4π −∞ −∞
× exp {− j (xω1 + yω2 )} dω1 dω2 , (5.A.59)

the distribution of (X, Y ) is called the bi-variate isotropic SαS distribution.

Expressing the pdf (5.A.59) of the bi-variate isotropic SαS distribution in infinite
series, we have
⎧  
⎪ +



1
2 (−1)k−1 Γ 2 1 + αk
1 αk
⎪ π γ α k=1
⎪ 2
2 k! 2
⎪  



⎪   √x 2 +y 2 −αk−2
⎨ × sin kαπ
2 1 ,
γα
f X,Y (x, y) = (5.A.60)

⎪ for 0 < α ≤ 1,

⎪ +∞  2k+2   x 2 +y 2 k




1 1
Γ − 2 ,

⎪ 2παγ α k=0 (k!)
2 2 α 4γ α

for 1 ≤ α ≤ 2.
Appendices 403

Example 5.A.1 Show that (5.A.60) represents a bi-variate Cauchy distribution and
a bi-variate normal distribution for α = 1 and α = 2, respectively. In other words,
show that the two infinite series of (5.A.60) become
γ
f X,Y (x, y) =   23 (5.A.61)
2π x 2 + y 2 + γ 2

when α = 1 and that the second infinite series of (5.A.60) becomes


 2 
1 x + y2
f X,Y (x, y) = exp − (5.A.62)
4πγ 4γ

when α = 2.
 2k+3   1 √
Solution First, note that we have Γ + k Γ 21 + k = (2k+1)!
2
= 2 22k+1 k!
π from
− 23
+∞
(1.4.75) and (1.4.84). Thus, recollecting that (1 + x) = k
− 23 Ck x , i.e.,
k=0


. (−1)k (2k + 1)!
(1 + x)− 2 =
3
xk, (5.A.63)
k=0
22k (k!)2

we get

∞    / 2 0−k−2
1 . 2k (−1)k−1 2 k kπ x + y2
Γ + 1 sin
π2 γ 2 k! 2 2 γ
k=1
⎧  / 0  / 03
1 ⎨ 21 Γ 2 23 γ 23 Γ 2 25 γ
= 2 2   − 
π x + y2 ⎩ 1! x 2 + y2 3! x 2 + y2
 / 05   / 07 ⎫
25 Γ 2 27 γ 27 Γ 2 29 γ ⎬
+  −  + ···
5! x 2 + y2 7! x 2 + y2 ⎭

.∞  / 02k+1
1 (−1)k 22k+1 2 2k + 3 γ
= 2 2   Γ 
π x + y2 (2k + 1)! 2 x 2 + y2
k=0

/ 0 2k+1
1 . (−1)k (2k + 1)! γ
=  2  
π x +y 2 22k+1 (k!)2 x 2 + y2
k=0

/ 0k
1 γ . (−1)k (2k + 1)! γ2
=  ×
2π x 2 + y 2 x 2 + y2 22k (k!)2 x 2 + y2
k=0
/ 0− 3
γ γ2 2
=  3 1+ 2
x + y2
2π x 2 + y 2 2
γ
=  3 (5.A.64)
2π x 2 + y 2 + γ 2 2
404 5 Normal Random Vectors

when α = 1 from the first infinite series of (5.A.60). The result (5.A.64)
+
∞  2 2 k +∞  2 2 k
Γ (2k+2) x +y (2k+1)!(−1)k x +y
can also be obtained as 2πγ (k!)
2 2 − 4γ 2 = 2πγ (k!) 2
2 2 2k γ2
=
k=0 k=0
 − 23
x 2 +y 2
1
2πγ 2
1+ γ2
, i.e.,

.∞  2 k
Γ (2k + 2) x + y2 γ
2 (k!)2
− 2
=  3 (5.A.65)
k=0
2πγ 4γ 2π x 2 + y 2 + γ 2 2

from the second infinite series of (5.A.60) using (5.A.63). Next, when α = 2,
 2 2 k
1 + Γ (k+1)

+y
from the second infinite series of (5.A.60), we get 4πγ (k!)2
− x 4γ = 4πγ
1
k=0
+∞  2 2 k +

x +y (−x)k
1
k!
− 4γ , which is the same as (5.A.62) because k!
= e−x . ♦
k=0 k=0

Exercises

Exercise 5.1 Assume a random vector (X, Y ) with the joint pdf
   
1 2  2
f X,Y (x, y) = √ exp − x 2 + y 2 cosh xy . (5.E.1)
π 3 3 3

(1) Show that X ∼ N (0, 1) and Y ∼ N (0, 1).


(2) Show that X and Y are uncorrelated.
(3) Is (X, Y ) a bi-variate normal random vector?
(4) Are X and Y independent of each other?

Exercise 5.2 When X 1 ∼ N (0, 1) and X 2 ∼ N (0, 1) are independent of each other,
obtain the conditional joint pdf of X 1 and X 2 given that X 12 + X 22 < a 2 .

Exercise 5.3 Assume that X 1 ∼ N (0, 1) and X 2 ∼ N (0, 1) are independent of each
other.

(1) Obtain the joint pdf of U = X 2 + Y 2 and V = tan−1 YX .
(2) Obtain the joint pdf of U = 21 (X + Y ) and V = 21 (X − Y )2 .

Exercise 5.4 Obtain the conditional pdf’s f Y |X (y|x) and f X |Y (x|y) when (X, Y )
∼ N (3, 4, 1, 2, 0.5).

Exercise 5.5 Obtain the correlation coefficient ρ Z W between Z = X 1 cos θ +


X 2 sin θ and W = X 2 cos θ − X 1 sin θ, and show that
Exercises 405

 2
σ12 − σ22
0 ≤ ρ2Z W ≤ (5.E.2)
σ12 + σ22
   
when X 1 ∼ N μ1 , σ12 and X 2 ∼ N μ2 , σ22 are independent of each other.

Exercise 5.6 When the two normal random variables X and Y are independent of
each other, show that X + Y and X − Y are independent of each other.

Exercise 5.7 Let us consider (5.2.1) and (5.2.2) when n = 3 and s = 1. Based on 
 −1
−1 1 ρ23
(5.1.18) and (5.1.21), show that Ψ 22 − Ψ 21 Ψ 11 Ψ 12 is equal to K 22 = .
ρ23 1
Exercise 5.8 Consider the random variable

Y, when Z = +1,
X = (5.E.3)
−Y, when Z = −1,

where Z is a binary random variable with pmf p Z (1) = p Z (−1) = 0.5 and Y ∼
N (0, 1).
(1) Obtain the conditional cdf FX |Y (x|y).
(2) Obtain the cdf FX (x) of X and determine whether or not X is normal.
(3) Is the random vector (X, Y ) normal?
(4) Obtain the conditional pdf f X |Y (x|y) and the joint pdf f X,Y (x, y).

⎛ 1 1 5.9
Exercise ⎞ For a zero-mean normal random vector X with covariance matrix
1 6 36
⎝ 1 1 1 ⎠, find a linear transformation to decorrelate X.
6 6
1 1
36 6
1
Exercise 5.10 Let X = (X, Y ) denote the coordinate of a point in the two-
√ and C = (R, Θ) be its polar coordinate. Specifically, as shown in
dimensional plane
Fig. 5.12, R = X 2 + Y 2 is the distance from the origin to X, and Θ = ∠X is the
angle between the positive x-axis and the line from the origin to X, where we assume
−π < Θ ≤ π. Express the joint pdf of C in terms of the joint pdf of X. When X is
an i.i.d. random vector with marginal distribution N 0, σ 2 , prove or disprove that
C is an independent random vector.

Exercise 5.11 For the limit pdf lim f X 1 ,X 2 (x, y) shown in (5.1.15), show that
ρ→±1
∞ ∞
−∞ lim f X 1 ,X 2 (x, y)dy = f X 1 (x) and −∞ lim f X 1 ,X 2 (x, y)d x = f X 2 (y).
ρ→±1 ρ→±1

Exercise 5.12 Consider


⎛ a⎞zero-mean normal random vector (X 1 , X 2 , X 3 ) with
111
covariance matrix ⎝1 2 1⎠. Obtain the conditional distribution of X 3 when X 1 =
113
X 2 = 1.
406 5 Normal Random Vectors

Fig. 5.12 Polar coordinate


C = (R, Θ) = (|X| , ∠X) R
for X = (X, Y )

Y X = (X, Y )

Θ
X

Exercise 5.13 Consider


 the linear transformation
 (Z , W ) = (a X + bY, cX + dY )
of (X, Y ) ∼ N m X , m Y , σ 2X , σY2 , ρ . When ad − bc = 0, find the requirement for
{a, b, c, d} for Z and W to be independent of each other.
   2
 4 5.144 When (X, Y ) ∼ N 0, 0, σ X , σY , ρ , we have E X = σ X 2 and
2 2 2
Exercise
E  X = 3σ X . Based on these two results and (4.4.44), obtain E{X Y }, E X Y ,
E X 3 Y , and E X 2 Y 2 . Compare the results with those you can obtain from (5.3.22)
or (5.3.51).

Exercise 5.15 For astandard tri-variate normal


 random vector (X 1 , X 2 , X 3 ), denote
the covariance by E X i X j − E {X i } E X j = ρi j . Show that
  
E X 12 X 22 X 32 = 1 + 2 ρ212 + ρ223 + ρ231 + 8ρ12 ρ23 ρ31 (5.E.4)

based on the moment theorem. Show (5.E.4) based on Taylor series of the cf.
  
Exercise 5.16 When (Z , W ) ∼ N m 1 , m 2 , σ12 , σ22 , ρ , obtain E Z 2 W 2 .

Exercise 5.17 Denote the joint moment by μi j = E X i Y j for a zero-mean random
vector (X, Y ). Based on the moment theorem, (5.3.22),
 (5.3.30), or
 (5.3.51), obtain
μ51 , μ42 , and μ33 for a random vector (X, Y ) ∼ N 0, 0, σ12 , σ22 , ρ .

Exercise 5.18 Using the cf, prove (5.3.9) and (5.3.10).



Exercise 5.19 Denote the joint absolute moment by νi j = E |X |i |Y | j for a zero-

mean random vector (X, Y ). By direct integration, show that ν11 = π2 1 − ρ2 +
   
ρ sin−1 ρ and ν21 = π2 1 + ρ2 for (X, Y ) ∼ N (0, 0, 1, 1, ρ). For (X, Y ) ∼
 
N 0, 0, π 2 , 2, √12 , calculate E{|X Y |}.
 
Exercise 5.20 Based on (5.3.30), obtain E {|X 1 |}, E {|X 1 X 2 |}, and E  X 1 X 23  for
X = (X 1 , X 2 ) ∼ N (0, 0, 1, 1, ρ). Show

2  
ρ|X 1 ||X 2 | = 1 − ρ2 + ρ sin−1 ρ − 1 . (5.E.5)
π−2
Exercises 407

Next, based on Price’s theorem, show that

1 π  
E {X 1 u (X 1 ) X 2 u (X 2 )} = + sin−1 ρ ρ + 1 − ρ2 , (5.E.6)
2π 2

which12 implies that E {W Y u(W )u(Y )} = σW2πσY ρ cos−1 (−ρ) + 1 − ρ2 when
 
(W, Y ) ∼ N 0, 0, σW 2
, σY2 , ρ . In addition, when W = Y , we can obtain
 2
E W u(W ) = 21 σW 2
with ρ = 1 and σY = σW , which can be proved by a
 ∞   ∞
direct integration as E W 2 u(W ) = 0 √ x 2 exp − 2σx 2 d x = 21 −∞ √ x 2
2 2 2

 2
 2πσW W 2πσW

exp − 2σx 2 d x = 21 σW 2
.
W

Exercise 5.21 Show



2
E {X 1 X 2 |X 3 |} = (ρ12 + ρ23 ρ31 ) (5.E.7)
π

for a standard tri-variate normal random vector (X 1 , X 2 , X 3 ).


Exercise 5.22 Based on Price’s theorem, (5.1.26), and (5.1.27), show that13

|K 3 |
E {δ (X 1 ) δ (X 2 ) |X 3 |} = √  , (5.E.8)
2π 3 1 − ρ212
c23
E {δ (X 1 ) X 2 X 3 } = √ , (5.E.9)

c23
E {δ (X 1 ) sgn (X 2 ) X 3 } =  , (5.E.10)
π 1 − ρ212

2 
E {δ (X 1 ) |X 2 | |X 3 |} = |K 3 | + c23 sin−1 β23,1 , (5.E.11)
π3

and

2
E {δ (X 1 ) sgn (X 2 ) sgn (X 3 )} = sin−1 β23,1 (5.E.12)
π3

  
12 Here, the range of the inverse cosine function cos−1 x is [0, π], and cos sin−1 ρ = 1 − ρ2 .
   
Note that, letting π2 + sin−1 ρ = θ, we get cos θ = cos π2 + sin−1 ρ = − sin sin−1 ρ = −ρ and,
π 
subsequently, 2 + sin−1 ρ ρ = ρ cos−1 (−ρ). Thus, we have θ = cos−1 (−ρ).
13 Here, using E {sgn (X ) sgn (X )} = 2 sin−1 ρ
2 3 π 23 obtained in (5.3.29) and E {δ (X 1 )}
= √1 , we can obtain E {δ (X 1 ) sgn (X 2 ) sgn (X 3 )}|ρ31 =ρ12 =0 = E {δ (X 1 )} E {sgn
2π 
(X 2 ) sgn (X 3 )} = π23 sin−1 ρ23 from (5.E.12) when ρ31 = ρ12 = 0. This result is the same
 

as π23 sin−1 β23,1  .
ρ31 =ρ12 =0
408 5 Normal Random Vectors

for a standard tri-variate normal random vector (X 1 , X 2 , X 3 ), where ci j = ρ jk ρki −


ρi j . Then, show that
  
2 −1
E {X 1 |X 2 | sgn (X 3 )} = ρ12 sin ρ23 + ρ31 1 − ρ23
2
(5.E.13)
π

based on Price’s theorem and (5.E.10).

Exercise 5.23 Using (5.3.38)–(5.3.43), show that


  
8  .
c
 
E {|X 1 X 2 X 3 |} = |K 3 | + ρi j + ρ jk ρki κi jk , (5.E.14)
π3

8
E {sgn (X 1 ) sgn (X 2 ) |X 3 |} = (κ123 + ρ23 κ312 + ρ31 κ231 ) , (5.E.15)
π3

and
 
 2 2ρ31 ρ12 − ρ23 ρ212 − ρ23 ρ231
E X 12 sgn (X 2 ) sgn (X 3 ) = 
π 1 − ρ223
2
+ sin−1 ρ23 (5.E.16)
π

for14 a standard tri-variate normal random vector (X 1 , X 2 , X 3 ), where κi jk =


sin−1 βi j,k . Confirm (5.E.7) and (5.E.13).

Exercise5.24 Confirm (5.E.8)15 and (5.E.12) based on (5.E.15). Based on (5.E.16),


obtain E X 12 δ (X 2 ) δ (X 3 ) and confirm (5.E.10).

 
14 We can easily get E  X 12 X 2  = E {|X 1 X 2 X 3 |}| X 3 →X 1 = E {|X 1 X 2 X 3 |}|ρ31 =1 =
      
π3
8
0 + π2 1 + ρ2 = π2 1 + ρ2 with (5.E.14). Similarly, with (5.E.15), it is easy
to get E {|X 3 |} = E {sgn (X 1 ) sgn (X 2 ) |X 3 |}| X 2 →X 1 = E {sgn (X 1 ) sgn (X 2 ) |X 3 |}|ρ12 =1 =
  2
 
8
sin −1 1−ρ23 + 2ρ sin−1 0 = 2
and E {sgn (X 1 ) X 2 } = E {sgn (X 1 ) sgn
π3 1−ρ223 23 π
   
(X 2 ) |X 3 |}|ρ23 =1 = π83 0 + 0 + ρ12 sin−1 1 = π2 ρ12 . Next, when |ρ23 | → 1, we

have ρ31 → sgn (ρ23 ) ρ12 because X 3 → X 2 , and thus lim 2ρ31 ρ12 − ρ23 ρ212 −
ρ23 →1
 2ρ31 ρ12 −ρ23 ρ212 −ρ23 ρ231 −ρ212 −ρ231
ρ23 ρ231 = 0. Consequently, we get lim  = lim =
ρ23 →1 1−ρ223 ρ23 →1 √
−2ρ23
2
2 1−ρ23
 2 
ρ12 +ρ231 1−ρ223
lim ρ23 = 0 in (5.E.16) using L’Hospital’s theorem.
ρ23 →1
 √|K 3 |
15 Based on this result, we have 1−ρ2
dρ12 = sin−1 β12,3 + ρ23 sin−1 β31,2 + ρ31
12
sin−1 β23,1 + h (ρ23 , ρ31 ) for a function h.
Exercises 409

Exercise 5.25 Find thecoefficient of the term ρ̃212 ρ̃22 ρ̃434 m 1 m 4 in the expansion
of the joint moment E X 13 X 24 X 34 X 45 for a quadri-variate normal random vector
(X 1 , X 2 , X 3 , X 4 ).

Exercise 5.26 Using the Price’s theorem, confirm that

2  
E {|X 1 X 2 |} = 1 − ρ2 + ρ sin−1 ρ (5.E.17)
π

for (X 1 , X 2 ) ∼ N (0, 0, 1, 1, ρ). The result (5.E.17) is obtained with other meth-
ods in Exercises 5.19 and 5.20. When (X1 , X 2 ) ∼ N (0, 0, 1, 1,  X1 =
 ρ) and
 2 −1
 
X 2 , (5.E.17) can be written as E X 1 = π ρ sin ρ + 1 − ρ σ1 σ2 
2 2
   ρ=1,σ2 =σ1
= σ12 , implying E X 2 = σ 2 when X ∼ N 0, σ 2 .

Exercise 5.27 Let us show some results related to the general formula (5.3.51) for
the joint moments of normal random vectors.
(1) Confirm the coefficient

a!
da, j = (5.E.18)
2 j j! (a − 2 j)!

in (5.3.17).
(2) Recollecting (5.3.46)–(5.3.48), show that
⎛ ⎞⎛ ⎞
 . *3 *
3 *
3
da,l ⎝ ρ̃iijj ⎠ ⎝ m j a, j ⎠
l L
E X 1a1 X 2a2 X 3a3 = (5.E.19)
l∈Sa i=1 j=i j=1

for {ai = 0, 1, . . .}i=1


3
, where

a1 !a2 !a3 !
da,l = +
3 / 0/ 0 (5.E.20)
ljj 1
3 13 1
3
2 j=1 li j ! L a, j !
i=1 j=i j=1

when n = 3.
(3) Show that (5.3.51) satisfies (5.A.1) and (5.A.2).

Exercise 5.28 When r is even, show the moment


1  
2 −1 Γ r +1
r
 Γ
E X r
= σGr k
r 3 k (5.E.21)
Γ 2
k

for the generalized normal random variable X with the pdf (5.A.39).
410 5 Normal Random Vectors

Exercise 5.29 When vk > r and r is even, show the r -th moment
    1
Γ v − rk Γ r +1 2 −1
r
 r Γ
= v σGr r 3
r k k
E X k (5.E.22)
Γ (v)Γ 2 k

for the generalized Cauchy random variable X with the pdf (5.A.44).
γ  2
Exercise 5.30 Obtain the pdf f X (x) of X from the joint pdf f X,Y (x, y) = 2π x
− 3
+y 2 + γ 2 2 shown in (5.A.61). Confirm that the pdf is the same as the pdf f (r ) =
α
 2 −1
π
r + α2 obtained by letting β = 0 in (2.5.28).
Exercise 5.31 Show that the mgf of the sample mean is
 n

t
M X n (t) = M (5.E.23)
n

for a sample X = (X 1 , X 2 , . . . , X n ) with marginal mgf M(t).


Exercise 5.32 Obtain the mean and variance, and show the mgf
 
δt
MY (t) = (1 − 2t)− 2 exp
n
, (5.E.24)
1 − 2t

for Y ∼ χ2 (n, δ).


Exercise 5.33 Show the r -th moment
 r k−r
 r k 2 Γ ( 2 )Γ ( r +1
2 )

πΓ ( k2 )
, when r < k and r is even,
E X = (5.E.25)
0, when r < k and r is odd

of X ∼ t (k).
Exercise 5.34 Obtain the mean and variance of Z ∼ t (n, δ).
Exercise 5.35 For H ∼ F(m, n), show that

  n k Γ  m + k  Γ  n − k 
E H k
= 2
  2 (5.E.26)
m Γ m2 Γ n2
9n:
for k = 1, 2, . . . , 2
− 1.
Exercise 5.36 Obtain the mean and variance of H ∼ F(m, n, δ).
Exercise 5.37 For i.i.d. random variables X 1 , X 2 , X 3 , and X 4 with marginal distri-
bution N (0, 1), show that the pdf of Y = X 1 X 2 + X 3 X 4 is f Y (y) = 21 e−|y| .
Exercises 411

Exercise 5.38 Show that the distribution of Y = −2 ln X is χ2 (2) when X ∼


U (0, 1). When {X i ∼ U (0, 1)}i=1
k
are all independent of each other, show that
+
k
−2 ln X i ∼ χ2 (2k).
i=1

Exercise 5.39 Prove that


1 
X n = X n−1 + X n − X n−1 (5.E.27)
n

+
n
for the sample mean X n = 1
n
X i with X 0 = 0.
i=1

Exercise 5.40 Let us denote the k-th central moment of X i by E (X i − m)k = μk
 
for k = 0, 1, . . .. Obtain the fourth central moment μ4 X n of the sample mean X n
for a sample X = (X 1 , X 2 , . . . , X n ).

Exercise 5.41 Prove Theorem 5.1.4 by taking the steps described below.
(1) Show that the pdf f 3 (x, y, z) shown in (5.1.19) can be written as
     2
1 (x + t12 y)2 1 − t12
2
y
f 3 (x, y, z) =  exp −   exp −  
8π 3 |K 3 | 2 1 − ρ212 2 1 − ρ12 2
 
1 − ρ212
× exp − (z + b12 ) ,
2
(5.E.28)
2 |K 3 |
 
where t12 = |K q12
3|
and b12 = c231−ρ
y+c31 x
2 with q12 = c12 1 − ρ212 − c23 c31 and ci j =
ρ jk ρki − ρi j .
12

(2) Show that lim t12 = −ξ12 and


ρ12 →±1

1 − ρ212 1
lim = . (5.E.29)
ρ12 →±1 |K 3 | 1 − ρ223
α  
Subsequently, using lim π
exp −αx 2 = δ(x), show that
α→∞

 
1 (x + t12 y)2 δ(x − ξ12 y)
lim  exp −   =  , (5.E.30)
ρ12 →±1 8π 3 |K 3 | 2 1 − ρ212 2π 1 − ρ223

 
where ξi j = sgn ρi j .
2
1−t12
(3) Show that lim 2 = 1, which instantly yields
ρ12 →±1 1−ρ12
412 5 Normal Random Vectors
   2  
1 − t12
2
y 1 2
lim exp −   = exp − y . (5.E.31)
ρ12 →±1 2 1 − ρ212 2

(4) Using (5.E.29), show that


   
1 − ρ212 (z − μ1 (x, y))2
lim exp − (z + b12 ) 2
= exp −   ,
ρ12 →±1 2 |K 3 | 2 1 − ρ223
(5.E.32)

where μ1 (x, y) = 21 ξ12 (ρ23 x + ρ31 y). Combining (5.E.30), (5.E.31), and
(5.E.32) into (5.E.28), and noting that ρ23 = ξ12 ρ31 when ρ12 → ±1 and that
y can be replaced with ξ12 x due to the function δ(x − ξ12 y), we get (5.1.36).
(5) Obtain (5.1.37) from (5.1.36).

Exercise 5.42 Assume (X, Y ) has the standard bi-variate normal pdf φ2 .
X 2 −2ρX Y +Y 2
(1) Obtain the pdf and cdf of V = g(X, Y ) = 2(1−ρ2 )
.
(2) Note that φ2 (x, y) = c is equivalent to x −2ρx y + y = c1 , an ellipse, for
2 2

positive constants c and c1 . Show that c1 = −2 1 − ρ2 ln(1 − α) for the ellipse


containing 100α% of the distribution of (X, Y ).

 x
Exercise 5.43 Consider (X 1 , X 2 ) ∼ N (0, 0, 1, 1, ρ) and g(x) = 2β 0
α
φ(z)dz,
i.e.,
x 
g(x) = β 2Φ −1 , (5.E.33)
α

where α > 0, β > 0, Φ is the standard normal cdf, and φ is the standard normal
pdf. Obtain the correlation RY = E {Y1 Y2 } and correlation coefficient ρY between
Y1 = g (X 1 ) and Y2 = g (X 2 ). Obtain the values of ρY when α2 = 1 and α2 → ∞.
Note that g is a smoothly increasing function from −β to β. When α = 1, we have
β {2Φ (X i ) − 1} ∼ U (−β, β) because Φ(X ) ∼ U (0, 1) when X ∼ Φ from (3.2.50).

Exercise 5.44 In Figs. 5.1, 5.2 and 5.3, show that the angle θ between the major
axis of the ellipse and the positive x-axis can be expressed as (5.1.9).

References

M. Abramowitz, I.A. Stegun (eds.), Handbook of Mathematical Functions (Dover, New York, 1972)
J. Bae, H. Kwon, S.R. Park, J. Lee, I. Song, Explicit correlation coefficients among random variables,
ranks, and magnitude ranks. IEEE Trans. Inform. Theory 52(5), 2233–2240 (2006)
References 413

W. Bär, F. Dittrich, Useful formula for moment computation of normal random variables with
nonzero means. IEEE Trans. Automat. Control 16(3), 263–265 (1971)
R.F. Baum, The correlation function of smoothly limited Gaussian noise. IRE Trans. Inform. Theory
3(3), 193–197 (1957)
J.L. Brown Jr., On a cross-correlation property of stationary processes. IRE Trans. Inform. Theory
3(1), 28–31 (1957)
W.B. Davenport Jr., Probability and Random Processes (McGraw-Hill, New York, 1970)
W.A. Gardner, Introduction to Random Processes with Applications to Signals and Systems, 2nd
edn. (McGraw-Hill, New York, 1990)
I.S. Gradshteyn, I.M. Ryzhik, Table of Integrals, Series, and Products (Academic, New York, 1980)
J. Hajek, Nonparametric Statistics (Holden-Day, San Francisco, 1969)
J.B.S. Haldane, Moments of the distributions of powers and products of normal variates. Biometrika
32(3/4), 226–242 (1942)
G.G. Hamedani, Nonnormality of linear combinations of normal random variables. Am. Stat. 38(4),
295–296 (1984)
B. Holmquist, Moments and cumulants of the multivariate normal distribution. Stochastic Anal.
Appl. 6(3), 273–278 (1988)
R.A. Horn, C.R. Johnson, Matrix Analysis (Cambridge University Press, Cambridge, 1985)
L. Isserlis, On a formula for the product-moment coefficient of any order of a normal frequency
distribution in any number of variables. Biometrika 12(1/2), 134–139 (1918)
N.L. Johnson, S. Kotz, Distributions in Statistics: Continuous Multivariate Distributions (Wiley,
New York, 1972)
A.R. Kamat, Incomplete moments of the trivariate normal distribution. Indian J. Stat. 20(3/4),
321–322 (1958)
R. Kan, From moments of sum to moments of product. J. Multivariate Anal. 99(3), 542–554 (2008)
S. Kotz, N. Balakrishnan, N.L. Johnson, Continuous Multivariate Distributions, 2nd edn. (Wiley,
New York, 2000)
E.L. Melnick, A. Tenenbein, Misspecification of the normal distribution. Am. Stat. 36(4), 372–373
(1982)
D. Middleton, An Introduction to Statistical Communication Theory (McGraw-Hill, New York,
1960)
G.A. Mihram, A cautionary note regarding invocation of the central limit theorem. Am. Stat. 23(5),
38 (1969)
T.M. Mills, Problems in Probability (World Scientific, Singapore, 2001)
S. Nabeya, Absolute moments in 3-dimensional normal distribution. Ann. Inst. Stat. Math. 4(1),
15–30 (1952)
C.L. Nikias, M. Shao, Signal Processing with Alpha-Stable Distributions and Applications (Wiley,
New York, 1995)
J.K. Patel, C.H. Kapadia, D.B. Owen, Handbook of Statistical Distributions (Marcel Dekker, New
York, 1976)
J.K. Patel, C.B. Read, Handbook of the Normal Distribution, 2nd edn. (Marcel Dekker, New York,
1996)
D.A. Pierce, R.L. Dykstra, Independence and the normal distribution. Am. Stat. 23(4), 39 (1969)
R. Price, A useful theorem for nonlinear devices having Gaussian inputs. IRE Trans. Inform. Theory
4(2), 69–72 (1958)
V.K. Rohatgi, A.K. Md. E. Saleh, An Introduction to Probability and Statistics, 2nd edn. (Wiley,
New York, 2001)
J.P. Romano, A.F. Siegel, Counterexamples in Probability and Statistics (Chapman and Hall, New
York, 1986)
I. Song and S. Lee, Explicit formulae for product moments of multivariate Gaussian random vari-
ables. Stat. Prob. Lett. 100, 27–34 (2015)
414 5 Normal Random Vectors

I. Song, S. Lee, Y.H. Kim, S.R. Park, Explicit formulae and implication of the expected values of
some nonlinear statistics of tri-variate Gaussian variables. J. Korean Stat. Soc. 49(1), 117–138
(2020)
J.M. Stoyanov, Counterexamples in Probability, 3rd edn. (Dover, New York, 2013)
K. Triantafyllopoulos, On the central moments of the multidimensional Gaussian distribution. Math.
Sci. 28(2), 125–128 (2003)
G.A. Tsihrintzis, C.L. Nikias, Incoherent receiver in alpha-stable impulsive noise. IEEE Trans.
Signal Process. 43(9), 2225–2229 (1995)
G.L. Wies, E.B. Hall, Counterexamples in Probability and Real Analysis (Oxford University, New
York, 1993)
C.S. Withers, The moments of the multivariate normal. Bull. Austral. Math. Soc. 32(1), 103–107
(1985)
Chapter 6
Convergence of Random Variables

In this chapter, we discuss sequences of random variables and their convergence. The
central limit theorem, one of the most important and widely-used results in many
areas of the applications of random variables, will also be described.

6.1 Types of Convergence

In discussing the convergence of sequences (Grimmett and Stirzaker 1982; Thomas


1986) of random variables, we consider whether every or almost every sequence is
convergent, and if convergent, whether the sequences converge to the same value or
different values.

6.1.1 Almost Sure Convergence

Definition 6.1.1 (sure convergence; almost sure convergence) For every point ω of
the sample space on which the random variable X n is defined, if

lim X n (ω) = X (ω), (6.1.1)


n→∞

then the sequence {X n }∞


n=1 is called surely convergent to X , and if
 
P ω : lim X n (ω) = X (ω) = 1, (6.1.2)
n→∞

then the sequence {X n }∞


n=1 is called almost surely convergent to X .

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 415
I. Song et al., Probability and Random Variables: Theory and Applications,
https://doi.org/10.1007/978-3-030-97679-8_6
416 6 Convergence of Random Variables

Sure convergence is also called always convergence, everywhere convergence,


or certain convergence. When a sequence {X n }∞ n=1 is surely convergent to X , it is
c. e. s.
denoted by X n → X , X n → X , or X n → X . The sure convergence implies that all
the sequences are convergent for all ω, yet the limit value of the convergence may
depend on ω.
Almost sure convergence is synonymous with convergence with probability 1,
almost always convergence, almost everywhere convergence, and almost certain con-
vergence. When a sequence {X n }∞
n=1 is almost surely convergent to X , it is denoted by
a.c. a.e. a.s. w. p.1
X n −→ X , X n −→ X , X n −→ X , or X n −→ X . For an almost surely convergent
 
sequence {X n (ω)}∞ ˜ ˜
n=1 , we have lim X n (ω) = X (ω) for any ω ∈  when P  = 1
n→∞
˜ ⊆ . Although a sequence {X n (ω)}∞
and  n=1 for which ω ∈
/˜ may or may not con-

verge, the set of such ω has probability 0: in other words, P ω : ω ∈ ˜ ω ∈  = 0.
/ ,

Example 6.1.1 Recollecting (1.5.17),

P (i.o. |X n | > ε) = 0 (6.1.3)

for every positive number ε and


a.s.
X n −→ 0 (6.1.4)

are the necessary and sufficient conditions of each other. ♦



When a sequence {X n }i=1 of random variables is almost surely convergent, almost
every random variable in the sequence will eventually be located within a range of
2ε for any number ε > 0: although some random variables may not converge, the
probability of ω for such random variables will be 0. The strong law of large numbers,
which we will consider later in this chapter, is an example of almost sure convergence.

Example 6.1.2 (Leon-Garcia 2008) For a randomly chosen point ω ∈ [0, 1], assume
that P(ω ∈ (a, b)) = b − a for 0 ≤ a ≤ b ≤ 1.Now consider
 the five sequences of
random variables An (ω) = ωn , Bn (ω) = ω 1 − n1 , Cn (ω) = ωen , Dn (ω) =
cos 2π nω, and Hn (ω) = exp{−n(nω − 1)}. The sequence {An (ω)}∞ n=1 converges
always to 0 for any value of ω ∈ [0, 1], and thus it is surely convergent to 0. The
sequence {Bn (ω)}∞n=1 converges to ω for any value of ω ∈ [0, 1], and thus it is surely
convergent to ω with the limit distribution U [0, 1]. The sequence {Cn (ω)}∞ n=1 con-
verges to 0 when ω = 0 and diverges when ω ∈ (0, 1]: in other words, it is not
convergent. The sequence {Dn (ω)}∞ n=1 converges to 1 when ω ∈ {0, 1} and oscil-
lates between −1 and 1 when ω ∈ (0, 1): in other words, it is not convergent.
When n → ∞, Hn (0) = en → ∞ for ω = 0 and Hn (ω) → 0 for ω ∈ (0, 1]: in other
words, {Hn (ω)}∞n=1 is not surely convergent. However, because P(ω ∈ (0, 1]) = 1,
{Hn (ω)}∞n=1 converges almost surely to 0. ♦
6.1 Types of Convergence 417

Example 6.1.3 (Stoyanov 2013) Consider a sequence {X n }∞


n=1 . When



P (|X n | > ε) < ∞ (6.1.5)
n=1

a.s.
for ε > 0, it is easy to see that X n −→ 0 as n → ∞ from the Borel-Cantelli lemma.
In addition, even if we change the condition (6.1.5) into


P (|X n | > εn ) < ∞ (6.1.6)
n=1

a.s.
for a sequence {εn }∞
n=1 such that εn ↓ 0, we still have X n −→ 0. Now, when ω is a
randomly chosen point in [0, 1], for a sequence {X n }∞
n=1 with

0, 0 ≤ ω ≤ 1 − n1 ,
X n (ω) = (6.1.7)
1, 1 − n1 < ω ≤ 1,

a.s.
we have X n −→ 0 as n → ∞. However, for any εn such that εn ↓ 0, if we con-
sider a sufficiently large n, we have P (|X n | > εn ) = P (X n = 1) = n1 and thus
∞
P (|X n | > εn ) → ∞ for the sequence (6.1.7). In other words, (6.1.6) is a suf-
n=1
ficient condition for the sequence to converge almost surely to 0, but not a necessary
condition. ♦
a.s.
Theorem 6.1.1 (Rohatgi and Saleh 2001; Stoyanov 2013) If X n −→ X for a
sequence {X n }∞
n=1 , then

lim P sup |X m − X | > ε = 0 (6.1.8)


n→∞ m≥n

holds true for every ε > 0, and the converse also holds true.
a.s. a.s.
Proof If X n −→ X , then we have X n − X −→ 0. Thus, let us show that
a.s.
X n −→ 0 (6.1.9)

and

lim P sup |X m | > ε = 0 (6.1.10)


n→∞ m≥n

are the necessary and sufficient conditions of each other.


418 6 Convergence of Random Variables

Assume (6.1.9) holds true. Let An (ε) = sup |X n | > ε , C = lim X n =
m≥n n→∞

0 , and Bn (ε) = C ∩ An (ε). Then, Bn+1 (ε) ⊆ Bn (ε) and ∩ Bn (ε) = ∅, and thus
n=1

lim P(Bn (ε)) = P ∩ Bn (ε) = 0. Recollecting P(C) = 1 and P (C c ) = 0, we
n→∞
  n=1
get P C c ∩ Acn (ε) ≤ P (C c ) = 0 because
 c C c ∩ Acn (ε) ⊆ C c . We also have
P (Bn (ε)) = P(C ∩ An (ε)) = 1 − P C ∪ An(ε) = 1 − P (C c ) − P Acn (ε) +
c

P C c ∩ Acn (ε) = P ( An (ε)) + P C c ∩ Acn (ε) , i.e.,

P (Bn (ε)) = P (An (ε)) . (6.1.11)

Therefore, we have (6.1.10). 


Next, assume that (6.1.10) holds true. Letting D(ε) = lim sup |X n | > ε > 0 ,
n→∞
we have P(D(ε)) = 0 because D(ε) ⊆ An (ε) for n = 1, 2, . . .. In addition, because
∞ 

C c = lim X n = 0 ⊆ ∪ lim sup |X n | > k1 , we get 1 − P(C) ≤ P
n→∞ k=1
  1  n→∞ k=1
D k = 0 and, consequently, (6.1.9). ♠

To show that a sequence of random variables is almost surely convergent, it is


necessary that either the distribution of ω or the relationship between ω and the
random variables are available, or that the random variables are sufficiently simple
to show the convergence. Let us now consider a convergence weaker than almost
sure convergence. For example, we may require most of the random  variables in the
sequence {X n }∞
n=1 to be close to X in the sense that E (X n − X )
2
is small enough.
Such a convergence focuses on the time instances and is easier to show convergence
or divergence than almost sure convergence because it does not require convergence
of all the sequences.

6.1.2 Convergence in the Mean



Definition 6.1.2 (convergence in the r th mean) For a sequence
  {X n }n=1 and a ran-
∞
dom variable X , assume that the r -th absolute moments E |X n | n=1 and E {|X |r }
r

are all finite. If


 
lim E |X n − X |r = 0, (6.1.12)
n→∞

r
then {X n }∞
n=1 is called to converge to X in the r -th mean, and is denoted by X n → X
r
L
or X n −→ X .
6.1 Types of Convergence 419

When r = 2, convergence in the r -th mean is called convergence in the mean


square, and
 
lim E |X n − X |2 = 0 (6.1.13)
n→∞

is written also as

l.i.m. X n = X, (6.1.14)
n→∞

where l.i.m. is the acronym of ‘limit in the mean’.

Example 6.1.4 (Rohatgi and Saleh 2001) Assume the distribution



1 − n1 , x = 0,
P (X n = x) = (6.1.15)
1
n
, x =1

for a sequence {X n }∞
n=1 . Then, the sequence
  converges in the mean square to X such
that P(X = 0) = 1 because lim E |X n |2 = lim n1 = 0.
n→∞ n→∞

Example 6.1.5 (Leon-Garcia 2008) We have observed that the sequence


Bn (ω) = 1 − n1 ω in Example 6.1.2 converges surely to ω. Now, because
   2
lim E {Bn (ω) − ω}2 = lim E ωn = lim 3n1 2 = 0, the sequence {Bn (ω)}∞
n=1
n→∞ n→∞ n→∞
converges to ω also in the mean square.

Mean square convergence is easy to analyze and


 meaningful also in engineering
applications because the quantity E |X n − X |2 can be regarded as the power of
an error. The Cauchy criterion shown in the following theorem allows us to see if a
sequence converges in the mean square even when we do not know the limit X :

Theorem 6.1.2 A necessary and sufficient condition for a sequence {X n }∞


n=1 to con-
verge in the mean square is
 
lim E |X n − X m |2 = 0. (6.1.16)
n,m→∞

Example 6.1.6 Consider


 the sequence
 {X n }∞
n=1 discussed
 in Example
 6.1.4. Then,
we have lim E |X n − X m | = 0 because E |X n − X m |2 = 1 × P (X n =
2
n,m→∞
0, X m = 1) + 1 × P (X n = 1, X m = 0) = n1 P ( X m = 0| X n = 1) + m (X n
1
P =
0 |X m = 1 ), i.e.,

  1 1
E |X n − X m |2 ≤ + . (6.1.17)
n m

Therefore, {X n }∞
n=1 converges in the mean square. ♦
420 6 Convergence of Random Variables

   2
Example 6.1.7 We have lim E |Bn − Bm |2 = lim E n1 − m1 ω2 =
n,m→∞ n,m→∞
   2
E ω2 lim n1 − m1 = 0 for the sequence {Bn }∞
n=1 in Example 6.1.5.
n,m→∞

Mean square convergence implies that more and more sequences are close to
the limit X as n becomes larger. However, unlike in almost sure convergence, the
sequences close to X do not necessarily always stay close to X .

Example 6.1.8 (Leon-Garcia 2008) In Example 6.1.2, the sequence Hn (ω) =


exp{−n(nω − 1)} is shown to converge almost surely to 0. Now, because
  1   e2n
  
lim E |Hn (ω) − 0|2 = lim e2n 0 exp −2n 2 ω dω = lim 2n 2 1 − exp −2n 2
n→∞ n→∞ n→∞
→ ∞, the sequence {Hn (ω)}∞
n=1 does not converge to 0 in the mean square.

6.1.3 Convergence in Probability and Convergence


in Distribution

Definition 6.1.3 (convergence in probability) A sequence {X n }∞ n=1 is said to con-


verge stochastically, or converge in probability, to a random variable X if

lim P (|X n − X | > ε) = 0 (6.1.18)


n→∞

p
for every ε > 0, and is denoted by X n → X .

Note that (6.1.18) implies that almost every sequence is within a range of 2ε at any
given time but that the sequence is not required to stay in the range. However, (6.1.8)
dictates that a sequence is required to stay within the range 2ε once it is inside the
 This can easily be confirmed by interpreting the meanings of {|X n − X | > ε}
range.
and sup |X m − X | > ε .
m≥n

Example 6.1.9 (Rohatgi and Saleh 2001) Assume the pmf



1 − n1 , x = 0,
p X n (x) = (6.1.19)
1
n
, x =1

for a sequence {X n }∞
n=1 . Then, because

0, ε ≥ 1,
P (|X n | > ε) = (6.1.20)
1
n
, 0 < ε < 1,

p
we have lim P (|X n | > ε) = 0 and thus X n → 0. ♦
n→∞
6.1 Types of Convergence 421

Example 6.1.10 Assume a sequence {X n }∞


n=1 with the pmf
 1
, x = 3, 4,
P (X n = x) = 2n (6.1.21)
1 − n1 , x = 5.

p
We then have X n → 5 because

⎨ 0, ε ≥ 2,
P (|X n − 5| > ε) = 1
, 1 ≤ ε < 2, (6.1.22)
⎩ 2n
1
n
, 0 < ε < 1,

and thus lim P (|X n − 5| > ε) = 0. ♦


n→∞

Theorem 6.1.3 (Rohatgi and Saleh 2001) If a sequence {X n }∞ n=1 converges to X


in probability and g is a continuous function, then {g (X n )} converges to g(X ) in
probability.
Theorem 6.1.3 requires g to be a continuous function: note that the theorem may
not hold true if g is not a continuous function.
 
Example 6.1.11 (Stoyanov 2013) Assume X n ∼ N 0, σn and consider the unit
2

step function u(x) with u(0) = 0. Then, {X n }∞


n=1 converges in probability to a random
variable X which is almost surely 0 and u(X ) is a random variable which is almost
surely 0. However, because u (X n ) = 0 and 1 each with probability 21 , we have
p
u (X n )  u(X ). ♦
Definition 6.1.4 (convergence in distribution) If the cdf Fn of X n satisfies

lim Fn (x) = F(x) (6.1.23)


n→∞

for all points at which the cdf F of X is continuous, then the sequence {X n }∞
n=1 is
d
said to converge weakly, in law, or in distribution to X , and is written as X n → X or
l
Xn → X .
x √  
n nt 2
Example 6.1.12 For the cdf Fn (x) = √
−∞ σ 2π exp − 2σ 2 dt of X n , we have

⎨ 0, x < 0,
lim Fn (x) = 1
, x = 0, (6.1.24)
n→∞ ⎩2
1, x > 0.

Thus, {X n }∞
n=1 converges weakly to a random variable X with the cdf

0, x < 0,
F(x) = (6.1.25)
1, x ≥ 0.
422 6 Convergence of Random Variables

Note that although lim Fn (0) = F(0), the convergence in distribution does not
n→∞
require the convergence at discontinuity points of the cdf: in short, the convergence
of {Fn (x)}∞
n=1 at the discontinuity point x = 0 of F(x) is not a prerequisite for the
convergence in distribution. ♦

6.1.4 Relations Among Various Types of Convergence

We now discuss the relations among various types of convergence discussed in pre-
vious sections. First, let

A = {collection of sequences almost surely convergent},


D = {collection of sequences convergent in distribution},
Ms = {collection of sequences convergent in the s-th mean},
Mt = {collection of sequences convergent in the t-th mean},
P = {collection of sequences convergent in probability},

and t > s > 0. Then, we have (Rohatgi and Saleh 2001)

D ⊃ P ⊃ Ms ⊃ Mt (6.1.26)

and

P ⊃A. (6.1.27)

In addition, neither A and Ms nor A and Mt include each other.


Example 6.1.13 (Stoyanov 2013) Assume that the pdf of X is symmetric and let
X n = −X . Then, because1

d
X n = X, (6.1.28)

d  
we have X n → X . However, because P (|X n − X | > ε) = P |X | > 2ε  0 when
p
n → ∞, we have X n  X . ♦

Example 6.1.14 For the sample space S = {ω1 , ω2 , ω3 , ω4 }, assume the event space
2 S and the uniform probability measure. Define {X n }∞
n=1 by

0, ω = ω3 or ω4 ,
X n (ω) = (6.1.29)
1, ω = ω1 or ω2 .

d
1 Here, = means ‘equal in distribution’ as introduced in Example 3.5.18.
6.1 Types of Convergence 423

Also let

0, ω = ω1 orω2 ,
X (ω) = (6.1.30)
1, ω = ω3 orω4 .

Then, the cdf’s of X n and X are both



⎨ 0, x < 0,
F(x) = 1
, 0 ≤ x < 1, (6.1.31)
⎩2
1, x ≥ 1.

d
In other words, X n → X . Meanwhile, because |X n (ω) − X (ω)| = 1 for ω ∈ S and
n ≥ 1, we have P(|X n − X | > ε)  0 for n → ∞. Thus, X n does not converge to
X in probability.

Example 6.1.15 (Stoyanov 2013) Assume that P (X n = 1) = n1 and P (X n = 0) =


1 − n1 for a sequence {X n }∞
n=1 of independent random variables. Then, we have
p
X n → 0 because P (|X n − 0| > ε) = P (X n = 1) = 1
n
→ 0 when n → ∞ for any

ε ∈ (0, 1). Next, let An (ε) = {|X n − 0| > ε} and Bm (ε) = ∪ An (ε). Then, we have
n=m

P (Bm (ε)) = 1 − lim P (X n = 0, for all n ∈ [m, M]) . (6.1.32)


M→∞

∞ 
 
Noting that 1− 1
k
= 0 for any natural number m, we get
k=m

1 1 1
P (Bm (ε)) = 1 − lim 1− 1− ··· 1 −
M→∞ m m+1 M
=1 (6.1.33)

because {X n }∞
n=1 is an independent sequence. Thus, lim P (Bm (ε)) = 0 and, from
m→∞
a.s.
Theorem 6.1.1, X n  0. ♦

Example 6.1.16 (Rohatgi and Saleh 2001; Stoyanov 2013) Based on the inequal-
 1
ity |x|s ≤ 1 + |x|r for 0 < s < r , or on the Lyapunov inequality E {|X |s } s ≤
 1
E {|X |r } r for 0 < s < r shown in (6.A.21), we can easily show that

Lr Ls
X n −→ X ⇒ X n −→ X (6.1.34)

for 0 < s < r . Next, if the distribution of the sequence {X n }∞


n=1 is

 r +s
n− 2 , x = n,
P (X n = x) = r +s (6.1.35)
1 − n− 2 , x = 0
424 6 Convergence of Random Variables

Lr   s−r
for 0 < s < r , then X n −→ 0 because E X ns = n 2 → 0 for n → ∞: however,
  r −s Ls
because E X nr = n 2 → ∞, we have X n  0. In short,

Ls Lr
X n −→ 0  X n −→ 0 (6.1.36)

for r > s. ♦

Example 6.1.17 (Stoyanov 2013) If


1
, x = en ,
P (X n = x) = n (6.1.37)
1 − n, x = 0
1

p
for a sequence {X n }∞
n=1 , then X n → 0 because P (|X n | < ε) = P (X n = 0) = 1 −
Lr  
1
nr n
→ 1 for ε > 0 when n → ∞. However, we have X n  0 because E X nr =
e
n
→ ∞.

Example 6.1.18 (Rohatgi and Saleh 2001) Note first that a natural number n can be
expressed uniquely as

n = 2k + m (6.1.38)
 
with integers k ∈ {0, 1, . . .} and m ∈ 0, 1, . . . , 2k − 1 . Define a sequence {X n }∞
n=1
by

2k , 2mk ≤ ω < m+1
,
X n (ω) = 2k (6.1.39)
0, otherwise

for n = 1, 2, . . . on  = [0, 1]. Assume the pmf


 1
, x = 2k ,
P (X n = x) = 2k (6.1.40)
1− 1
2k
, x = 0.

Then, lim X n (ω) does not exist for any choice ω ∈  and, therefore, the sequence
n→∞
does not converge almost surely. However, because

P (|X n | > ε) = P (X n > ε)



0, ε ≥ 2k ,
= 1 (6.1.41)
2k
, 0 < ε < 2k ,

p
we have lim P (|X n | > ε) = 0, and thus X n → 0. ♦
n→∞
6.1 Types of Convergence 425

Example 6.1.19 (Stoyanov 2013) Consider a sequence {X n }∞


n=1 with

1 − n1α , x = 0,
P (X n = x) = (6.1.42)
1
2n α
, x = ±n,


∞ 1 1
where α > 0. Then, E |X n | 2 < ∞ when α > 23 because E |X n | 2 =
n=1
√ √
n −α+ 2 . Letting α = ε and X = α |X n | in the Markov inequality P(X ≥ α) ≤
1

E{X }
introduced in (6.A.15), we have P (|X n | > ε) ≤ ε− 2 E |X | 2 , and thus
1 1

α
∞
P (|X n | > ε) < ∞ when ε > 0. Now, employing Borel-Cantelli lemma as in
n=1
a.s. L2
Example 6.1.3, we have X n −→ 0. Meanwhile, X n  0 when α ≤ 2 because
  a.s. L2  
E |X n |2 = n α−2
1
. In essence, we have X n −→ 0, yet X n  0 for α ∈ 23 , 2 . ♦

d d
Example 6.1.20 (Stoyanov 2013) When X n → X and Yn → Y , we have X n +
d
Yn → X + Y if the sequences {X n }∞ ∞
n=1 and {Yn }n=1 are independent of each other.
On the other hand, if the two sequences are not independent of each other, we may
d
have different results. For example, assume X n → X ∼ N (0, 1) and let Yn = α X n .
Then, we have

d  
Yn → Y ∼ N 0, α 2 , (6.1.43)
 
and the distribution of X n + Yn = (1 + α)X
 n 2converges
 to N 0, (1 + α)2 . How-
ever, because X ∼ N (0, 1) and Y ∼ N 0, α , the distribution of X + Y is not
necessarily N 0, (1 + α)2 . In other words, if the sequences {X n }∞ ∞
n=1 and {Yn }n=1
d
are not independent of each other, it is possible that X n + Yn  X + Y even when
d d
X n → X and Yn → Y . ♦

6.2 Laws of Large Numbers and Central Limit Theorem

In this section, we will consider the sum of random variables and its convergence. We
will then introduce the central limit theorem (Davenport 1970; Doob 1949; Mihram
1969), one of the most useful and special cases of convergence.
426 6 Convergence of Random Variables

6.2.1 Sum of Random Variables and Its Distribution

The sum of random variables is one of the key ingredients in understanding and
applying the properties of convergence and limits. We have discussed the properties
of the sum of random variables in Chap. 4. Specifically, the sum of two random
variables as well as the cf and distribution of the sum of a number of random variables
are discussed in Examples 4.2.4, 4.2.13, and 4.3.8. We now consider the sum of a
number of random variables more generally.

n
Theorem 6.2.1 The expected value and variance of the sum Sn = X i of the ran-
i=1
dom variables {X i }i=1
n
are


n
E {Sn } = E {X i } (6.2.1)
i=1

and


n 
n−1 
n
 
Var {Sn } = Var {X i } + 2 Cov X i , X j , (6.2.2)
i=1 i=1 j=i+1

respectively.

n n
Proof First, it is easy to see that E {Sn } = E Xi = E {X i }. Next, we have
n  
i=1 i=1
 n   2
Var ai X i = E ai X i − E {X i } , i.e.,
i=1 i=1

 n  ⎧ ⎫
 ⎨n 
n
 ⎬

Var ai X i = E ai a j (X i − E {X i }) X j − E X j
⎩ ⎭
i=1 i=1 j=1


n 
n−1 
n
 
= ai2 Var {X i } + 2 ai a j Cov X i , X j (6.2.3)
i=1 i=1 j=i+1

when {ai }i=1


n
are constants. Letting {ai = 1}i=1
n
, we get (6.2.2). ♠

n
Theorem 6.2.2 The variance of the sum Sn = X i can be expressed as
i=1


n
Var {Sn } = Var {X i } (6.2.4)
i=1

when the random variables {X i }i=1


n
are uncorrelated.
6.2 Laws of Large Numbers and Central Limit Theorem 427

Theorem 6.2.1 says that the expected value of the sum of random variables is the
sum of the expected values of the random variables. In addition, the variance of the
sum of random variables is obtained by adding the sum of the covariances between
two distinct random variables to the sum of the variances of the random variables.
Theorem 6.2.2 dictates that the variance of the sum of uncorrelated random variables
is simply the sum of the variances of the random variables.

Example
 6.2.1 (Yates and Goodman 1999) Assume thatn the joint moments are
E X i X j = ρ |i− j| for zero-mean random variables {X i }i=1 . Obtain the expected
value and variance of Yi = X i−2 + X i−1 + X i for i = 3, 4, . . . , n.

Solution Using (6.2.1), we have E {Yi } = E {X i−2 } + E {X i−1 } + E {X i } = 0.


Next, from (6.2.2), we get

Var {Yi } = Var {X i−2 } + Var {X i−1 } + Var {X i }


+ 2 {Cov (X i−2 , X i−1 ) + Cov (X i−1 , X i ) + Cov (X i−2 , X i )}
= 3ρ 0 + 2ρ 1 + 2ρ 1 + 2ρ 2
= 3 + 4ρ + 2ρ 2 (6.2.5)

because Var {X i } = ρ 0 = 1. ♦

Example 6.2.2 In a meeting of a group of n people, each person attends with a gift.
The name tags of the n people are put in a box, from which each person randomly
picks one name tag: each person gets the gift brought by the person on the name
tag. Let G n be the number of people who receive their own gifts back. Obtain the
expected value and variance of G n .

Solution Let us define



1, when personipicks her/his own name tag,
Xi = (6.2.6)
0, otherwise.

Then,


n
Gn = Xi . (6.2.7)
i=1

For any person, the probability of picking her/his own name tag is n1 . Thus, E {X i } =
 
1 × n1 + 0 × n−1 = n1 and Var {X i } = E X i2 − E2 {X i } = n1 − n12 . In addition,
 n   
because P X i X j = 1 = n(n−1)
1
and P X i X j = 0 = 1 − n(n−1)
1
for i = j, we have
     
Cov X i , X j = E X i X j − E {X i } E X j = n(n−1) − n 2 , i.e.,
1 1

  1
Cov X i , X j = . (6.2.8)
n 2 (n − 1)
428 6 Convergence of Random Variables

n
Therefore, E {G n } = 1
n
= 1 and Var {G n } = nVar {X i } + n(n − 1)Cov (X i ,
 i=1
X j = 1. In short, for any number n of the group, one person will get her/his own
gift back on average. ♦
Theorem 6.2.3 For independent random variables {X i }i=1n
, let the cf and mgf of X i
be ϕ X i (ω) and M X i (t), respectively. Then, we have


n
ϕ Sn (ω) = ϕ X i (ω) (6.2.9)
i=1

and


n
M Sn (t) = M X i (t) (6.2.10)
i=1


n
as the cf and mgf, respectively, of Sn = Xi .
i=1

Proof Noting that {X i }i=1


n
are independent of each other, we can easily obtain the
 jωS    n  
cf ϕ Sn (ω) = E e n = E e jω(X 1 +X 2 +···+X n ) = E e jωX i , i.e.,
i=1


n
ϕ Sn (ω) = ϕ X i (ω) (6.2.11)
i=1

as in (4.3.32). We can show (6.2.10) similarly. ♠


Theorem 6.2.4 For i.i.d. random variables {X i }i=1
n
,
let the cf and mgf of X i be
ϕ X (ω) and M X (t), respectively. Then, we have the cf

ϕ Sn (ω) = ϕ nX (ω) (6.2.12)

and the mgf

M Sn (t) = M Xn (t) (6.2.13)


n
of Sn = Xi .
i=1

Proof Noting that the random variables {X i }i=1


n
are all of the same distribution,
Theorem 6.2.4 follows directly from Theorem 6.2.3. ♠
Example 6.2.3 When {X i }i=1
n
are i.i.d. with marginal distribution b(1, p), obtain

n
the distribution of Sn = Xi .
i=1
6.2 Laws of Large Numbers and Central Limit Theorem 429

Solution The mgf of X i is M X (t) = 1 − p + pet as shown in (3.A.47). Therefore,


n
the mgf of Sn is M Sn (t) = 1 − p + pet , implying that Sn ∼ b(n, p). ♦
Example 6.2.4 When {X i }i=1 n
are independent of each other with X i ∼ b (ki , p),

n
obtain the distribution of Sn = Xi .
i=1
 k
Solution The mgf of X i is M X i (t) = 1 − p + pet i as shown in (3.A.49). Thus,
n 
 k
the mgf of Sn is M Sn (t) = 1 − p + pet i , i.e.,
i=1


n
  ki
M Sn (t) = 1 − p + pet i=1 . (6.2.14)


n
This result implies Sn ∼ b ki , p . ♦
i=1

n n n
Example 6.2.5 We have shown that Sn = Xi ∼ N m i , σi2 when
  n i=1 i=1 i=1
X i ∼ N m i , σi2 i=1 are independent of each other in Theorem 5.2.5.

n  
Example 6.2.6 Show that Sn = X i ∼ G n, λ1 when {X i }i=1
n
are i.i.d. with
i=1
marginal exponential distribution of parameter λ.
λ
Solution The mgf of X i is M X (t) = λ−t as shown in (3.A.67). Thus, the mgf of Sn
 λ n  
is M Sn (t) = λ−t and, therefore, Sn ∼ G n, λ1 . ♦
Definition 6.2.1 (random sum) Assume that the support of the pmf of a random
variable N is a subset of {0, 1, . . .} and that the random variables {X 1 , X 2 , . . . ,
X N } are independent of N . The random variable


N
SN = Xi (6.2.15)
i=1

is called the random sum or variable sum, where we assume S0 = 0.


 
The mgf of the random sum S N can be expressed as M SN (t) = E et SN =
     ∞    
∞  
E E et SN  N = E et SN  N = n p N (n) = E et Sn p N (n), i.e.,
n=0 n=0



M SN (t) = M Sn (t) p N (n), (6.2.16)
n=0


n
where p N (n) is the pmf of N and M Sn (t) is the mgf of Sn = Xi .
i=1
430 6 Convergence of Random Variables

Theorem 6.2.5 When the random variables {X i }i=1N


are i.i.d. with marginal mgf

N
M X (t), the mgf of the random sum S N = X i can be obtained as
i=1

M SN (t) = M N (ln M X (t)) , (6.2.17)

where M N (t) is the mgf of N .




Proof Applying Theorem 6.2.4 in (6.2.16), we get M SN (t) = M Xn (t) p N (n) =
n=0

∞  
en ln M X (t) p N (n) = E e N ln M X (t) , i.e.,
n=0

M SN (t) = M N (ln M X (t)) , (6.2.18)

where p N (n) is the pmf of N . ♠


 
Meanwhile, if we write M̃ N (z) as E z N , the mgf (6.2.17) can be written as
       ∞    ∞
M SN (t) = E et SN = E E et SN  N = E et SN  N = n P(N = n) =
n=0 n=0
 t X 1 +t X 2 +···+t X n
 
∞    tX   
E e P(N = n) = E e t X1
E e 2 · · · E et X n P(N = n) =
n=0


M Xn (t)P(N = n), i.e.,
n=0

M SN (t) = M̃ N (M X (t)) (6.2.19)

  ∞
using M̃ N (g(t)) = E g N (t) = g n (t)P(N = n).
n=0

Example 6.2.7 Assume that i.i.d. exponential random variables {X n }∞ n=1 with mean
1
λ
are independent of a geometric random variable N with pmf p N (k) = (1 − α)k−1 α

N
for k ∈ {1, 2, . . .}. Obtain the distribution of the random sum S N = Xi .
i=1

αet
λ
Solution The mgf’s of N and X i are M N (t) = 1−(1−α)et
and M X (t) = λ−t
, respec-
α exp(ln λ−t
λ
)
tively. Thus, the mgf of S N is M SN (t) = 1−(1−α) exp(ln λ−t
, i.e.,
λ
)

αλ
M SN (t) = . (6.2.20)
αλ − t
1
Therefore, S N is an exponential random variable with mean αλ . This result is also
in agreement with the intuitive interpretation that S N is the sum of, on average, α1
variables of mean λ1 . ♦
6.2 Laws of Large Numbers and Central Limit Theorem 431

Theorem 6.2.6 When the random variables {X i } are i.i.d., we have the expected
value

E {S N } = E{N }E{X } (6.2.21)

and the variance

Var {S N } = E{N }Var{X } + Var{N }E2 {X } (6.2.22)


N
of the random sum S N = Xi .
i=1

Proof
M X (t)
(Method 1) From (6.2.17), we have M S N (t) = M N (ln M X (t)) M X (t)
and


M X (t) 2
M SN (t) = M N (ln M X (t))
M X (t)
 2
M X (t)M X (t) − M X (t)
+M N (ln M X (t)) . (6.2.23)
M X2 (t)

Now, recollecting M X (0) = 1, we get E {S N } = M S N (0) = M N (0)M X (0), i.e.,

E {S N } = E{N }E{X } (6.2.24)


   2   2 
and E S N2 = M SN (0) = M N (0) M X (0) + M N (0) M X (0) − M X (0) , i.e.,

   
E S N2 = E N 2 E2 {X } + E{N }Var{X }. (6.2.25)

Combining (6.2.24) and (6.2.25), we have (6.2.22).


N
(Method 2) Because E{Y |N } = E {X i } = N E{X } from (4.4.40) with Y = S N ,
i=1    
we get the expected value of Y as E{Y } = E E {Y |N } = E N E {X } , i.e.,

E{Y } = E {N } E {X } . (6.2.26)


N 
N 
N
Similarly, recollecting that Y 2 = X i2 + X i X j , the second moment
i=1 i=1 j=1
i=j
     
E Y 2 = E E Y 2  N can be evaluated as
432 6 Convergence of Random Variables
     
E Y 2 = E N E X 2 + N (N − 1)E2 {X }
     
= E {N } E X 2 − E2 {X } + E N 2 E2 {X }
 
= E {N } Var {X } + E N 2 E2 {X } . (6.2.27)

From (6.2.26) and (6.2.27), we can obtain (6.2.22). ♠


   ∞
Example 6.2.8 Assume that i.i.d. random variables X n ∼ N m, σ 2 n=1 are inde-
pendent of N ∼ P(λ). Then,the random sum S N has the expected value E {S N } = λm
and variance Var {S N } = λ σ 2 + m 2 .

Let us note two observations.


(1) When the random variable N is a constant n: because E{N } = n and Var{N } =
0, we have 2 E {Sn } = nE{X } and Var {Sn } = nVar{X } from Theorem 6.2.6.
(2) When the random variable X i is a constant x: because E{X } = x, Var{X } = 0,
and S N = x N , we have E {S N } = xE{N } and Var {S N } = x 2 Var{N }.

6.2.2 Laws of Large Numbers

We now consider the limit of a sequence {X i }i=1


n
by taking into account the sum

n
Sn = X i for n → ∞.
i=1

6.2.2.1 Weak Law of Large Numbers

Definition 6.2.2 (weak law of large numbers) When we have

Sn − an p
→ 0 (6.2.28)
bn

for two sequences {an }∞ ∞


n=1 and {bn > 0}n=1 of real numbers such that bn ↑ ∞, the

sequence {X i }i=1 is called to follow the weak law of large numbers.

In Definition 6.2.2, {an }∞ ∞


n=1 and {bn }n=1 are called the central constants and nor-
malizing constants, respectively. Note that (6.2.28) can be expressed as
 
 Sn − an 

lim P  ≥ε =0 (6.2.29)
n→∞ b 
n

for every positive number ε.

2 This result is the same as (6.2.1) and (6.2.4).


6.2 Laws of Large Numbers and Central Limit Theorem 433

Theorem 6.2.7 (Rohatgi and Saleh 2001) Assume a sequence of uncorrelated ran-

dom variables {X i }i=1 with means E {X i } = m i and variances Var {X i } = σi2 . If



σi2 → ∞, (6.2.30)
i=1

then
!−1 !

n 
n
p
σi2 Sn − mi → 0. (6.2.31)
i=1 i=1


In other words, the sequence {X i }i=1 satisfies the weak law of large numbers with
n  2
n
an = m i and bn = σi .
i=1 i=1
Var{Y }
Proof Employing the Chebyshev inequality P(|Y − E{Y }| ≥ ε) ≤ ε2
intro-
duced in (6.A.16), we have
  ! !−2 ⎡ 2 ⎤
 
n  
n 
n 
n
 
P  Sn − mi  > ε σi ≤
2
ε σi2 E⎣ (X i − m i ) ⎦
 
i=1 i=1 i=1 i=1
!−1

n
= ε2 σi2 . (6.2.32)
i=1

 
 
n  
n

In short, P  Sn − m i  > ε σi2 → 0 when n → ∞. ♠
i=1 i=1

Example 6.2.9 (Rohatgi and Saleh 2001) If an uncorrelated sequence {X i }i=1 with
mean E {X i } = m i and variance Var {X i } = σi satisfies
2

1  2
n
lim σi = 0, (6.2.33)
n→∞ n 2
i=1


n p
then 1
n
Sn − mi → 0. This result, called the Markov theorem, can be easily
i=1
shown with the steps similar to those in the proof of Theorem 6.2.7. Here, (6.2.33)
is called the Markov condition. ♦

Example 6.2.10 (Rohatgi and Saleh 2001) Assume an uncorrelated sequence



{X i }i=1 with identical distribution, mean E {X i } = m, and variance Var {X i } = σ 2 .

∞   p
Then, because σ 2 → ∞, we have σ12 Snn − m → 0 from Theorem 6.2.7. Here,
i=1
an = nm and bn = nσ 2 .
434 6 Convergence of Random Variables

From now on, we assume bn = n in discussing the weak law of large numbers
unless specified otherwise.
Example 6.2.11 (Rohatgi and Saleh 2001) For an i.i.d. sequence of random variables
with distribution b(1, p), noting that the mean is p and the variance is p(1 − p), we
p
have Snn → p from Theorem 6.2.7 and Example 6.2.9. ♦
Example 6.2.12 (Rohatgi and Saleh 2001) For a sequence of i.i.d. random variables
with marginal distribution C(1, 0), we have Snn ∼ C(1, 0) as discussed in Exercise
6.3. In other words, because Snn does not converge to 0 in probability, the weak law
of large numbers does not hold for sequences of i.i.d. Cauchy random variables.
Example 6.2.13 (Rohatgi and Saleh 2001; Stoyanov 2013) For an i.i.d. sequence
∞ p
{X i }i=1 , if the absolute mean E {|X i |} is finite, then Snn → E {X 1 } when n → ∞ from
Theorem 6.2.7 and Example 6.2.9. This result is called Khintchine’s theorem.

6.2.2.2 Strong Law of Large Numbers

Definition 6.2.3 (strong law of large numbers) When we have

Sn − an a.s.
−→ 0 (6.2.34)
bn

for two sequences {an }∞ ∞


n=1 and {bn > 0}n=1 of real numbers such that bn ↑ ∞, the

sequence {X i }i=1 is called to follow the strong law of large numbers.
Note that (6.2.34) implies

Sn − an
P lim = 0 = 1. (6.2.35)
n→∞ bn

A sequence of random variables that follows the strong law of large numbers also
follows the weak law of large numbers because almost sure convergence implies
convergence in probability. As in the discussion of the weak law of large numbers,
we often assume the normalizing constant bn = n also for the strong law of large

numbers. We now consider sufficient conditions for a sequence {X i }i=1 to follow the
strong law of large numbers when bn = n.


Theorem 6.2.8 (Rohatgi and Saleh 2001) The sum (X i − μi ) converges almost
i=1
surely to 0 if


σi2 < ∞ (6.2.36)
i=1

∞ ∞
 ∞
for a sequence {X i }i=1 with means {μi }i=1 and variances σi2 i=1 .
6.2 Laws of Large Numbers and Central Limit Theorem 435


n
Theorem 6.2.9 (Rohatgi and Saleh 2001) If xi converges for a sequence {xn }∞
n=1 ,
i=1
1 
n
then limb
bi xi = 0 for {bn }∞
n=1 such that bn ↑ ∞. This result is called the
n→∞ n i=1
Kronecker lemma.

Example
 2 ∞6.2.14 (Rohatgi and Saleh 2001) Let the means and variances be {μi }i=1

and σi i=1 , respectively, for independent random variables {X i }i=1 . Then, we can
easily show that

1   a.s.
Sn − E {Sn } −→ 0 (6.2.37)
bn

from Theorems 6.2.8 and 6.2.9 when



 σ2 i
< ∞, bi ↑ ∞. (6.2.38)
i=1
bi2

When bn = n, (6.2.38) can be expressed as



 σ2 n
< ∞, (6.2.39)
n=1
n2

which is called the Kolmogorov condition. ♦

 ∞
Example 6.2.15 (Rohatgi and Saleh 2001) If thevariances σn2 n=1 of independent
random variables are uniformly bounded, i.e., σn2  ≤ M for a finite number M, then

1  a.s.
Sn − E {Sn } −→ 0 (6.2.40)
n


σn2 

from Kolmogorov condition because n2
≤ M
n2
< ∞. ♦
n=1 n=1

Example 6.2.16 (Rohatgi and Saleh 2001) Based on the result in Example 6.2.15,
it is easy to see that Bernoulli trials with probability of success p satisfy the strong
law of large numbers because the variance p(1 − p) is no larger than 41 .

Note that the Markov condition (6.2.33) and the Kolmogorov condition (6.2.38)
are sufficient conditions but are not necessary conditions.

Theorem 6.2.10 (Rohatgi and Saleh 2001) If the fourth moment is finite for an i.i.d.

sequence {X i }i=1 with mean E {X i } = μ, then
436 6 Convergence of Random Variables

Sn
P lim = μ = 1. (6.2.41)
n→∞ n

In other words, Sn
n
converges almost surely to μ.

Proof Let the variance of X i be σ 2 . Then, we have


⎡ 4 ⎤  

n 
n
E⎣ (X i − μ) ⎦=E (X i − μ) 4

i=1 i=1
⎧ ⎫
⎨n 
n
 2 ⎬
+3E (X i − μ)2 Xj −μ
⎩ ⎭
i=1 j=1, j =i
 
= nE (X 1 − μ) 4
+ 3n(n − 1)σ 4
≤ cn 2 (6.2.42)

for an appropriate constant c. From this result and the Bienayme-Chebyshev


& '
n 
  
n 4
inequality (6.A.25), we get P  (X i − μ) > nε ≤ (nε)4 E
1
(X i − μ)
i=1 i=1
 
∞  
≤ cn 2
(nε)4
c
= and, consequently,
n2
P  Snn − μ > ε < ∞, where c = c
ε4
. There-
 n=1 
fore, letting Aε = lim sup  Snn − μ > ε , we get
n→∞

P ( Aε ) = 0 (6.2.43)

from the Borel-Cantelli lemma discussed in Theorem 2.A.3. Now, {Aε } is an


 
increasing sequence of ε → 0, and converges to ω : lim  Snn − μ > 0 or, equiv-
n→∞
 
alently, to ω : lim n = μ : thus, we have P lim Snn = μ = P lim Aε =
Sn
n→∞ n→∞ ε→0
lim P ( Aε ) = 0 from (6.2.43). Subsequently, we get (6.2.41). ♠
ε→0


Example 6.2.17 (Rohatgi and Saleh 2001) Consider an i.i.d. sequence {X i }i=1 and a
positive number B. If P (|X i | < B) = 1 for every i, then n converges almost surely
Sn

to the mean E {X i } of X i . This can be easily shown from Theorem 6.2.10 by noting
that the fourth moment is finite when P (|X i | < B) = 1. ♦

Theorem 6.2.11 (Rohatgi and Saleh 2001) Consider an i.i.d. sequence {X i }i=1
with mean μ. If

E {|X i |} < ∞, (6.2.44)


6.2 Laws of Large Numbers and Central Limit Theorem 437

then
Sn a.s.
−→ μ. (6.2.45)
n
The converse also holds true.
Note that the conditions in Theorems 6.2.10 and 6.2.11 are on the fourth moment
and absolute mean, respectively.
Example 6.2.18 (Stoyanov 2013) Consider an independent sequence {X n }∞
n=2 with
the pmf

1− 1
, x = 0,
p X n (x) = n log n
(6.2.46)
1
2n log n
, x = ±n.



Letting An = {|X n | ≥ n} for n ≥ 2, we get P ( An ) → ∞ because P ( An )
n=2


= 1
n log n
. P ( An ) is divergent and {X n }∞
In other words, n=2 are independent:
n=2  X n  
therefore, the probability P (|X n | ≥ n occurs i.o.) = P  n  ≥ 1 occurs i.o. =
 
P lim Snn = 0 of {An occurs i.o.} is 1, i.e.,
n→∞

P (|X n | ≥ n occurs i.o.) = 1 (6.2.47)

from the Borel-Cantelli lemma. In short, the sequence {X n }∞


n=2 does not follow the
strong law of large numbers. On the other hand, the sequence {X n }∞
n=2 , satisfying the
Markov condition as
(
1 
n n+1
1 2 x
2
Var {X k } ≤ 2 + dx
n k=2 n log 2 3 log x
2 (n − 2)(n + 1)
≤ +
n2 log 2 n 2 log n
→0 (6.2.48)

from Var {X k } = k
log k
, follows the weak law of large numbers.

6.2.3 Central Limit Theorem

Let us now discuss the central limit theorem (Feller 1970; Gardner 2010; Rohatgi
and Saleh 2001), the basis for the wide-spread and most popular use of the normal
n
distribution. Assume a sequence {X n }∞
n=1 and the sum Sn = X k . Assume
k=1
438 6 Convergence of Random Variables

1 l
(Sn − an ) → Y (6.2.49)
bn

for appropriately chosen sequences {an }∞ ∞


n=1 and {bn > 0}n=1 of constants. It is known
that the distribution of the limit random variable Y is always a stable distribution.
For example, for an i.i.d. sequence {X n }∞
n=1 , we have n ∼ N (0, 1) if X i ∼ N (0, 1)
√Sn

and Snn ∼ C(1, 0) if X i ∼ C(1, 0): the normal and Cauchy distributions are typical
examples of the stable distribution. In this section, we discuss the conditions on
which the limit random variable Y has a normal distribution.

Example 6.2.19 (Rohatgi and Saleh 2001) Assume an i.i.d. sequence √ {X i }i=1
with marginal distribution b(1, p). Letting an = E {Sn } = np and bn = Var {Sn } =
√   n  
Sn −np Xi − p
np(1 − p), the mgf Mn (t) = E exp √np(1− p)
t = E exp √np(1− p)
t of
i=1
Sn −an
bn
can be obtained as
  n
npt t
Mn (t) = exp − √ (1 − p) + p exp √
np(1 − p) np(1 − p)
 n
pt (1 − p)t
= (1 − p) exp − √ + p exp √
np(1 − p) np(1 − p)
 2 n
t 1
= 1+ +o . (6.2.50)
2n n
 2
Thus, Mn (t) → exp t
2
when n → ∞ and, subsequently,

(
Sn − np 1 x
t2
P √ ≤x → √ exp − dt (6.2.51)
np(1 − p) 2π −∞ 2
 2
because exp t
2
is the mgf of N (0, 1). ♦

Theorem 6.2.12 (Rohatgi and Saleh 2001) For i.i.d. random variables {X i }i=1
with mean m and variance σ 2 , we have

Sn − nm l
√ → Z, (6.2.52)
nσ 2

where Z ∼ N (0, 1).


 
Proof Letting Yi = X i − m, we have E {Yi } = 0 and E Yi2 = σ 2. Also  let
Vi = √ i 2 . Denoting the pdf of Yi by f Y , the cf ϕV (ω) = E exp √ 2
Y jωY i
=
nσ nσ
∞   ∞  2  3
√jωy f Y (y)dy = −∞ 1 + √jω 2 y + 21 √jω 2 y 2 + 16 √jω 2
−∞ exp 2
nσ nσ nσ nσ

y 3 + · · · f Y (y)dy of Vi can be obtained as


6.2 Laws of Large Numbers and Central Limit Theorem 439

jω 1 jω 2
 
ϕV (ω) = 1 + √ E{Y } + √ E Y2 + ···
nσ 2 2 nσ 2
ω2 1
= 1− +o . (6.2.53)
2n n

n −nm

n
Next, letting Z n = S√
= Vi and denoting the cf of Z n by ϕ Z n , we have
nσ 2
i=1
ω2
1 n
lim ϕ Z n (ω) = lim ϕVn (ω) = lim 1 − 2n
+o n
, i.e.,
n→∞ n→∞ n→∞

ω2
lim ϕ Z n (ω) = exp − (6.2.54)
n→∞ 2

from (6.2.53) because {Vi }i=1 are independent. In short, the distribution of Z n =
n −nm
S√
2
converges to N (0, 1) as n → ∞. ♠

Theorem 6.2.12 is one of the many variants of the central limit theorem, and can
be derived from the Lindeberg’s central limit theorem introduced in Appendix 6.2.

Example 6.2.20 (Rohatgi and Saleh 2001) Assume an i.i.d. sequence {X i }i=1 with
marginal pmf p X i (k) = (1 − p)k p ũ(k), where 0 < p < 1 and ũ(k) is the unit step
function in discrete space defined in (1.4.17). We have E {X i } = qp and Var {X i } =
q
p2
, where q = 1 − p. Thus, it follows from Theorem 6.2.12 that

√  Sn  !
n p n −q
P √ ≤x → (x) (6.2.55)
q

for x ∈ R when n → ∞.

The central limit theorem is useful in many cases: however, it should also be noted
that there do exist cases in which the central limit theorem does not apply.

Example 6.2.21 (Stoyanov √


2013) Assume an i.i.d. sequence {Yk }∞ k=1 with P(Yk =
±1) = 2 , and let X k = 4k Yk . Then, it is easy to see that E {Sn } = 0 and Var {Sn } =
1 15

1 − 16−n . In other words, when n is sufficiently large, Var {Sn } ≈ 1. Meanwhile,


because √ 
|Sn | = |X 1 + X + · · · + X n | ≥ |X 1 | − (|X 2 | + |X 3 | + · · · + |X n |) =
√  √15 2 1  
4
15
− 12 1 − 4n−1 ≥ 6 > 2 , we have P |Sn | ≤ 21 = 0. Thus, P (Sn ) does not
15 1

converge to the standard normal cdf (x) at some point x. This fact implies that

{X i }i=1 does not satisfy the central limit theorem: the reason is that the random
variable X 1 is exceedingly large to virtually determine the distribution of Sn . ♦

The central limit theorem and laws of large numbers are satisfied for a wide
range of sequences of random variables. As we have observed in Theorem 6.2.7
and Example 6.2.15, the laws of large numbers hold true for uniformly bounded
independent sequences. As shown in Example 6.A.5 of Appendix 6.2, the central
440 6 Convergence of Random Variables

limit theorem holds true for an independent sequence even when the sum of variances

diverges. Meanwhile, for an i.i.d. sequence {X i }i=1 , noting that
 
 Sn  |Sn − nm| ε√

P  − m  > ε =P √ > n
n σ n σ
 ε√ 
≈ 1 − P |Z | ≤ n , (6.2.56)
σ
where Z ∼ N (0, 1), we can obtain the laws of large numbers directly from the
central limit theorem. In other words, the central limit theorem is stronger than the
laws of large numbers: yet, in the laws of large numbers we are not concerned with
the existence of the second moment. In some independent sequences for which the
central limit theorem holds true, on the other hand, the weak law of large numbers
does not hold true.

Example
 6.2.22
 (Feller
 1970;
 Rohatgi and Saleh 2001) Assume the pmf
P X k = k λ = P X k = −k λ = 21 for an independent sequence {X k }∞ k=1 , where
λ > 0. Then, the mean and variance of X k are E {X k } = 0 and Var {X k } = k 2λ ,

n 
n
respectively. Now, letting sn2 = Var {X k } = k 2λ , we have
k=1 k=1

n 2λ+1 − 1
sn2 ≥ (6.2.57)
2λ + 1


n n
from k 2λ ≥ 1 x 2λ d x. Here, we can assume n > 1 without loss of generality,
k=1
−1 2λ+1
and if we let n > 2λ+1ε2
+ 1, we have ε2 > 2λ+1
n−1
and ε2 sn2 > 2λ+1 s 2 ≥ 2λ+1
n−1 n
n
n−1 2λ+1
=
λ
n 2λ + n 2λ−1 + · · · + 1 > n 2λ . Therefore, for n > 2λ+1
ε 2 + 1, we have |x kl | > n if
|xkl | > εsn . Noting in addition that P (X k = x) is non-zero only when |x| ≤ n λ , we
get

1  
n
xkl2 pkl = 0. (6.2.58)
sn2 k=1 |xkl |>εsn

In short, the Lindeberg conditions3 are satisfied and the central limit theorem holds

n n
true. Now, if we consider sn2 = k 2λ ≤ 0 x 2λ d x, i.e.,
k=1

n 2λ+1
sn2 ≤ (6.2.59)
2λ + 1

3 Equations (6.A.5) and (6.A.6) in Appendix 6.2 are called the Lindeberg conditions.
6.2 Laws of Large Numbers and Central Limit Theorem 441
)
n 2λ+1
and (6.2.57), we can write sn ≈ 2λ+1
. Based on this, we have

* ! (
2λ + 1 1 b
t2
P a< Sn < b → √ exp − dt (6.2.60)
n 2λ+1 2π a 2

from Theorem 6.A.1.


Next, let us discuss if the weak law of large numbers holds true. First, when
0 < λ < 21 , it is easy to see that the weak law of large numbers holds true based
sn2 n 2λ−1
on Example 6.2.9 because n2
≤ 2λ+1
→ 0 from (6.2.59). When λ ≥ 21 , however,
rewriting (6.2.60), we get
! (
an λ− 2 bn λ− 2
1 1
b
Sn 1 t2
P √ < <√ → √ exp − dt, (6.2.61)
2λ + 1 n 2λ + 1 2π a 2
 
which implies that P −ε < Snn < ε → 1 is not always true when ε > 0. Thus, the
weak law of large numbers does not hold true. ♦

Let us discuss one application of the central limit theorem. First, from Theorem
6.2.12, we get the following theorem:

Theorem 6.2.13 For an i.i.d. sequence {X i }i=1


n
with mean m and variance σ 2 , we

n  
have X k ∼ N nm, nσ 2 asymptotically.
k=1

Example 6.2.23 For an i.i.d. sequence {X i }i=1


n
with marginal distribution U (0, 1),

n
compare the pdf of Sn = X i with the pdf of the asymptotic distribution described
i=1
in Theorem 6.2.13.

Solution From the pdf



1, x ∈ (0, 1),
f X i (x) = (6.2.62)
0, otherwise

of X i , we get the pdf f Sn (x) = f X 1 (x) ∗ f X 2 (x) ∗ · · · ∗ f X n (x) of Sn . Meanwhile,


 n
from the mean 21 and variance 12 1
of X i , the asymptotic distribution of Sn is N n2 , 12 .
Figure 6.1 shows the pdf and asymptotic pdf of Sn for n = 1, 2, 3, 4, which confirms
that the two pdf’s are closer when n is larger.

Example 6.2.24 Assume an i.i.d. sequence {X i }i=1


n
with marginal distribution
 1 
n
b 1, 2 . Compare the cdf of Sn = X i and the cdf of the asymptotic distribution
i=1
described in Theorem 6.2.13.
442 6 Convergence of Random Variables

1.5
1.5

1 1

0.5 0.5

0 0
−1 0 1 2 −1 0 1 2 3
(A) (B)
1.5 1.5

1 1

0.5 0.5

0 0
−1 0 1 2 3 4 −1 0 1 2 3 4 5
(C) (D)

Fig. 6.1 The pdf (blue solid line) and asymptotic pdf (black dashed line) of Sn for an i.i.d. sequence
with marginal distribution U (0, 1): (A) n = 1, (B) n = 2, (C) n = 3, (D) n = 4

1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 2 4 0 2 4 6 8
(A) (B)

1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 5 10 15 0 10 20 30
(C) (D)

Fig. 6.2 The cdf (blue solid line) of Sn and approximate cdf (black
 dotted
 line) from the central
limit theorem for an i.i.d. sequence with marginal distribution b 1, 21 : (A) n = 4, (B) n = 8, (C)
n = 16, (D) n = 32

 
Solution It is easy to see that Sn ∼ b n, 21 from Example 6.2.3. Noting that X i
 
has mean 21 and variance 41 , the asymptotic distribution of Sn is N n2 , n4 . Figure 6.2
shows the cdf and asymptotic cdf of Sn for n = 4, 8, 16, 32, which confirms that the
two cdf’s are closer when n is larger.

Theorem 6.2.14 For an i.i.d. sequence {X i }i=1


n
with mean m and variance σ 2 , we

n
can approximate the distribution of Snn = n1 X i as
i=1
6.2 Laws of Large Numbers and Central Limit Theorem 443

σ2
N m, (6.2.63)
n

and the cdf FSn of Sn as

x − nm
FSn (x) ≈ √ (6.2.64)
nσ 2

when n is sufficiently large.


Theorem 6.2.14 follows directly from Theorem 6.2.13.
Example 6.2.25 (Rohatgi and Saleh 2001) For an i.i.d. sequence {X i }i=1 n
with
marginal distribution b(1, p), the asymptotic distribution of Sn is N (np, np(1 − p)).
Therefore, based on

Sn − np x − np x − np
P √ ≤√ ≈ √ , (6.2.65)
np(1 − p) np(1 − p) np(1 − p)

we can approximately obtain P (Sn ≤ x) when n is sufficiently large: practi-


cally, n ≥ 20 is sufficient. When n = 25 and p = 21 , for example, P (Sn ≤ 12) =
 
P Sn −12.5
2.5
≤ −0.5
2.5
≈ P(Z ≤ −0.2) ≈ 0.421, where Z ∼ N (0, 1). ♦
Note that taking the continuity correction4 into account, a better approximation
can be obtained. Specifically, for an integer random variable X , employing

1 1
P (x1 ≤ X ≤ x2 ) = P x1 − < X < x2 + (6.2.66)
2 2

will provide us with a better approximation when x1 and x2 are integers.


Example 6.2.26 If we take the continuity correction into account in Example 6.2.25,
then P (Sn ≤ 12) = P (Sn < 12.5) ≈ P(Z < 0) = 0.5. This is the exact value of the
probability that we have at most twelve head when a fair coin is tossed 25 times. ♦

Appendices

Appendix 6.1 Convergence of Probability Functions

For a sequence {X n }∞
n=1 , let Fn and Mn be the cdf and mgf, respectively, of X n . We
first note that, when n → ∞, the sequence of cdf’s does not always converge and,
even when it does, the limit is not always a cdf.

4 The continuity correction has been considered also in (3.5.18) of Chap. 3.


444 6 Convergence of Random Variables

Example 6.A.1 Consider the cdf



⎨ 0, x ≤ 0,
Fn (x) = nx
, 0 ≤ x ≤ 1 + n1 , (6.A.1)
⎩ n+1
1, x ≥ 1 + n1

of X n . Then, the limit of the sequence {Fn }∞


n=1 is

⎨ 0, x ≤ 0,
lim Fn (x) = x, 0 ≤ x ≤ 1, (6.A.2)
n→∞ ⎩
1, x ≥ 1,

which is a cdf. ♦
Example 6.A.2 (Rohatgi and Saleh 2001) Consider the cdf Fn (x) = u(x − n) of
X n . The limit of the sequence {Fn }∞
n=1 is lim Fn (x) = 0, which is not a cdf.
n→∞
Example 6.A.3 (Rohatgi and Saleh 2001) Assume the pmf P (X n = −n) = 1 for a
sequence {X n }∞
n=1 . Then, the mgf is Mn (t) = e
−nt
and its limit is lim Mn (t) = M(t),
n→∞
where

⎨ 0, t > 0,
M(t) = 1, t = 0, (6.A.3)

∞, t < 0.

The function M(t) is not an mgf. In other words, the limit of a sequence of mgf’s is
not necessarily an mgf. ♦
Example 6.A.4 Assume the pdf f n (x) = πn 1+n12 x 2 of X n . Then, the cdf is Fn (x) =

n x dt
π −∞ 1+n 2 t 2
. We also have lim f n (x) = δ(x) and lim Fn (x) = u(x). These limits
n→∞
 −ε  ∞n→∞
imply lim P (|X n − 0| > ε) = −∞ δ(x)d x + ε δ(x)d x = 0 and, consequently,
n→∞
{X n }∞
n=1 converges to 0 in probability. ♦

Appendix 6.2 The Lindeberg Central Limit Theorem

The central limit theorem can be expressed in a variety of ways. Among those vari-
eties, the Lindeberg central limit theorem is one of the most general ones and does
not require the random variables to have identical distribution.

Theorem 6.A.1 (Rohatgi and Saleh 2001) For an independent sequence {X i }i=1 ,
let the mean, variance, and cdf of X i be m i , σi2 , and Fi , respectively. Let


n
sn2 = σi2 . (6.A.4)
i=1
Appendices 445

When the cdf Fi is absolutely continuous, assume that the pdf f i (x) = d
dx
Fi (x)
satisfies
n (
1 
lim (x − m i )2 f i (x)d x = 0 (6.A.5)
n→∞ s 2 |x−m |>εs
n i=1 i n


for every value of ε > 0. When {X i }i=1 are discrete random variables, assume the
pmf pi (x) = P (X i = x) satisfies5

1  
n
lim (xil − m i )2 pi (xil ) = 0 (6.A.6)
n→∞ s 2
n i=1 |xil −m i |>εsn

Li
for every value of ε > 0, where {xil }l=1 are the jump points of Fi with L i the number
of jumps of Fi . Then, the distribution of
!
1 
n 
n
Xi − mi (6.A.7)
sn i=1 i=1

converges to N (0, 1) as n → ∞.

Example 6.A.5 (Rohatgi and Saleh 2001) Assume an independent sequence


{X k ∼ U (−ak , ak )}∞
k=1 . Then, E {X k } = 0 and Var {X k } = 3 ak . Let |ak | < A and
1 2


n  2
n
sn2 = Var {X k } = 13 ak → ∞ when n → ∞. Then, from the Chebyshev
k=1 k=1
Var{Y }
inequality P(|Y − E{Y }| ≥ ε) ≤ ε2
discussed in (6.A.16), we get
(
1  A2  Var {X k }
n n
x 2 Fk (x)d x ≤
sn2 k=1 sn2 k=1 ε2 sn2
|x|>εsn

A2
=
ε2 sn2
→0 (6.A.8)


n  
n  
n
as n→∞ because 1
sn2
x 2 Fk (x)d x ≤ 1
sn2
A2 2a1k d x = A2
sn2
k=1 |x|>εsn k=1 |x|>εsn k=1
P (|X k | > εsn ).


Meanwhile, assume ak2 < ∞, and let sn2 ↑ B 2 for n → ∞. Then, for a con-
k=1
stant k, we can find εk such thatεk B < ak , and we have εk sn < εk B. Thus,
P (|X k | > εk sn ) ≥ P (|X k | > εk B) > 0. Based on this result, for n ≥ k, we get

5 As mentioned in Example 6.2.22 already, (6.A.5) and (6.A.6) are called the Lindeberg condition.
446 6 Convergence of Random Variables

( (
1  sn2 εk2 
n n
x 2 F j (x)d x ≥ F j (x)d x
sn2 j=1 sn2 j=1
|x|>εk sn |x|>εk sn

sn2 εk2    


n
= P Xj > εk sn
sn2 j=1

≥ εk2 P (|X k | > εk sn )


> 0, (6.A.9)

implying that the Lindeberg condition is not satisfied. In essence, in a sequence of


uniformly bounded independent random variables, a necessary and sufficient condi-


tion for the central limit theorem to hold true is Var {X k } → ∞. ♦
k=1

Example 6.A.6 (Rohatgi and Saleh 2001) Assume an independent sequence


  
n  
{X k }∞
k=1 . Let δ > 0, αk = E |X k |
2+δ
< ∞, and α j = o sn2+δ . Then, the Lin-
j=1
deberg condition is satisfied and the central limit theorem holds true. This can be
shown easily as
( (
1  1 
n n
|x|2+δ 
x 2 Fk (x)d x ≤ F (x)d x
sn2 k=1 sn2 k=1 εδ snδ k
|x|>εsn |x|>εsn
n (
 ∞
1
≤ |x|2+δ Fk (x)d x
εδ sn2+δ k=1 −∞
1 n
= αk
εδ sn2+δ k=1
→0 (6.A.10)

because x 2 < |x| from |x|δ x 2 > |εsn |δ x 2 when |x| > εsn . We can similarly show
2+δ

εδ snδ
that the central limit theorem holds true in discrete random variables. ♦

The conditions (6.A.5) and (6.A.6) are the necessary conditions in the following

 2 ∞for a sequence
sense: {X i }i=1 of independent random variables, assume the variances

σi i=1 of {X i }i=1 are finite. If the pdf of X i satisfies (6.A.5) or the pmf of X i satisfies
(6.A.6) for every value of ε > 0, then
⎛   ⎞
Xn − E Xn
lim P ⎝ ) ⎠
  ≤x = (x) (6.A.11)
n→∞
Var X n
Appendices 447

and
)  
lim P max |X k − E {X k }| > nε Var X n = 0, (6.A.12)
n→∞ 1≤k≤n


n
and the converse also holds true, where X n = 1
n
X i is the sample mean of {X i }i=1
n
i=1
defined in (5.4.1).

Appendix 6.3 Properties of Convergence

(A) Continuity of Expected Values



When the sequence {X n }∞n=1 converges to X , the sequence {E {X n }}n=1 of expected
values will also converge to the expected value E{X }, which is called the continuity
of expected values. The continuity of expected values (Gray and Davisson 2010) is
a consequence of the continuity of probability discussed in Appendix 2.1.
(1) Monotonic convergence. If 0 ≤ X n ≤ X n+1 for every integer n, then E {X n } →
E{X } as n → ∞. In other words, E lim X n = lim E {X n }.
n→∞ n→∞
(2) Dominated convergence. If |X n | < Y for every integer n and E{Y } < ∞, then
E {X n } → E{X } as n → ∞.
(3) Bounded convergence. If there exists a constant c such that |X n | ≤ c for every
integer n, then E {X n } → E{X } as n → ∞.

(B) Properties of Convergence


We list some properties among various types of convergence. Here, a and b are
constants.
p p p p
(1) If X n → X , then X n − X → 0, a X n → a X , and X n − X m → 0 for n, m →
∞.
p p
(2) If X n → X and X n → Y , then P(X = Y ) = 1.
p p
(3) If X n → a, then X n2 → a 2 .
p p
(4) If X n → 1, then X1n → 1.
p p
(5) If X n → X and Y is a random variable, then X n Y → X Y .
p p p p
(6) If X n → X and Yn → Y , then X n ± Yn → X ± Y and X n Yn → X Y .
p p p
(7) If X n → a and Yn → b = 0, then XYnn → ab .
d d d
(8) If X n → X , then X n + a → X + a and bX n → bX for b = 0.
d p d p
(9) If X n → a, then X n → a. Therefore, X n → a  X n → a.
p d d
(10) If |X n − Yn | → 0 and Yn → Y , then X n → Y . Based on this, it can be shown
d p
that X n → X when X n → X .
448 6 Convergence of Random Variables

d p d d
(11) If X n → X and Yn → a, then X n ± Yn → X ± a, X n Yn → a X for a = 0,
p d
X n Yn → 0 for a = 0, and Xn
Yn
→ X
a
for a = 0.
r =2    
(12) If X n −→ X , then lim E {X n } = E{X } and lim E X n2 = E X 2 .
n→∞ n→∞
Lr  
(13) If X n → X , then lim E |X n |r = E {|X |r }.
n→∞
p a.s.
(14) If X 1 > X 2 > · · · > 0 and X n → 0, then X n −→ 0.

(C) Convergence and Limits of Products


Consider the product


n
An = ak . (6.A.13)
k=1



The infinite product ak is called convergent to the limit A when An → A and
k=1
A = 0 for n → ∞; divergent to 0 when An → 0; and divergent when An is not
convergent to a non-zero value. The convergence of products is often related to the
convergence of sums as shown below.


(1) When all the real numbers {ak }∞
k=1 are positive, the convergence of ak and
k=1


that of ln ak are the necessary and sufficient conditions of each other.
k=1


(2) When all the real numbers {ak }∞
k=1 are positive, the convergence of (1 + ak )
k=1


and that of ak are the necessary and sufficient conditions of each other.
k=1


(3) When all the real numbers {ak }∞
k=1 are non-negative, the convergence of (1 −
k=1


ak ) and that of ak are the necessary and sufficient conditions of each other.
k=1

Appendix 6.4 Inequalities

In this appendix we introduce some useful inequalities (Beckenbach and Bellam


1965) in probability spaces.
(A) Inequalities for Random Variables

Theorem 6.A.2 (Rohatgi and Saleh 2001) If a measurable function h is non-negative


and E{h(X )} exists for a random variable X , then
Appendices 449

E{h(X )}
P(h(X ) ≥ ε) ≤ (6.A.14)
ε
for ε > 0, which is called the tail probability inequality.
Proof Assume X is a discrete random variable. Letting P (X = xk ) = pk , we
  
have E{h(X )} = + h (xk ) pk ≥ h (xk ) pk when A = {k : h (xk ) ≥ }:
A 
Ac A
this yields E{h(X )} ≥ pk = P(h(X ) ≥ ) and, subsequently, (6.A.14). ♠
A

Theorem 6.A.3 If X is a non-negative random variable, then6

E{X }
P(X ≥ α) ≤ (6.A.15)
α
for α > 0, which is called the Markov inequality.

 ∞h(X ) =
The Markov inequality can be proved easily from (6.A.14) by letting
|X | and
∞ε = α. We can show
∞ the Markov inequality also from E{X } = 0 x f X (x)
d x ≥ α x f X (x)d x ≥ α α f X (x)d x = αP(X ≥ α) by recollecting that a pdf is
non-negative.
Theorem 6.A.4 The mean E{Y } and variance Var{Y } of any random variable Y
satisfy

Var{Y }
P(|Y − E{Y }| ≥ ε) ≤ (6.A.16)
ε2
for any ε > 0, which is called the Chebyshev inequality.
Proof The random variable X = (Y −  E{Y1 })  is non-negative.
2
 Thus, if we use
}
(6.A.15), we get P [Y − E{Y }] ≥ ε ≤ ε2 E [Y − E{Y }]2 = Var{Y
2 2
. Now, not-
  ε2
ing that P [Y − E{Y }] ≥ ε = P(|Y − E{Y }| ≥ ε), we get (6.A.16).
2 2

Theorem 6.A.5 (Rohatgi and Saleh 2001) The absolute mean E{|X |} of any random
variable X satisfies

 ∞

P(|X | ≥ n) ≤ E{|X |} ≤ 1 + P(|X | ≥ n), (6.A.17)
n=1 n=1

which is called the absolute mean inequality.


Proof Let the pdf of a continuous random variable X be f X . Then, because E{|X |} =
∞ ∞ 

−∞ |x| f X (x)d x = k≤|x|<k+1 |x| f X (x)d x, we have
k=0

6 The inequality (6.A.15) holds true also when X is replaced by |X |r for r > 0.
450 6 Convergence of Random Variables



kP(k ≤ |X | < k + 1) ≤ E{|X |}
k=0


≤ (k + 1)P(k ≤ |X | < k + 1). (6.A.18)
k=0


∞ 
∞ 
∞ 

Now, employing kP(k ≤ |X | < k + 1) = P(k ≤ |X | < k + 1) =
k=0 n=1 k=n n=1

∞ 

P(|X | ≥ n) and (k + 1)P(k ≤ |X | < k + 1) = 1 + kP(k ≤ |X | < k + 1) =
k=0 k=0


1+ P(|X | ≥ n) in (6.A.18), we get (6.A.17). A similar procedure will show the
n=1
result for discrete random variables. ♠

Theorem 6.A.6 If f is a convex7 function, then

E {h(X )} ≥ h (E{X }) , (6.A.19)

which is called the Jensen inequality.

Proof Let m = E{X }. Then, from the intermediate value theorem, we have

1
h(X ) = h(m) + (X − m)h  (m) + (X − m)2 h  (α) (6.A.20)
2

for −∞ < α < ∞. Taking the expectation of the above equation, we get E{h(X )} =
h(m) + 21 h  (α)σ X2 . Recollecting that h  (α) ≥ 0 and σ X2 ≥ 0, we get E{h(X )} ≥
h(m) = h(E{X }). ♠

Theorem 6.A.7 (Rohatgi and Saleh 2001) If the n-th absolute moment E {|X |n } is
finite, then
 1  1
E{|X |s } s ≤ E{|X |r } r (6.A.21)

for 1 ≤ s < r ≤ n, which is called the Lyapunov inequality.

Proof Consider the bi-variable formula


( ∞  
k−1 k+1 2
Q(u, v) = u|x| 2 + v|x| 2 f (x) d x, (6.A.22)
−∞

7 A function h is called convex or concave up when h(t x + (1 − t)y) ≤ th(x) + (1 − t)h(y) for
every two points x and y and for every choice of t ∈ [0, 1]. A convex function is a continuous
function with a non-decreasing derivative and is differentiable except at a countable number of
points. In addition, the second order derivative of a convex function, if it exists, is non-negative.
Appendices 451

where f is the pdf of X . Letting βn = E {|X |n }, (6.A.22) can be written as Q(u, v) =
βk−1 βk β β 
(u v) (u v)T . Now, we have  k−1 k  ≥ 0, i.e., βk2k ≤ βk−1 k
βk+1
k
βk βk+1 βk βk+1
because Q ≥ 0 for every choice of u and v. Therefore, we have
2(n−1)
β12 ≤ β01 β21 , β24 ≤ β12 β32 , · · · , βn−1 ≤ βn−2 βn
n−1 n−1
(6.A.23)

with β0 = 1. If we multiply the first k − 1 consecutive inequalities in (6.A.23), then


1
we have βk−1
k
≤ βkk−1 for k = 2, 3, . . . , n, from which we can easily get β1 ≤ β22 ≤
1 1
β33 ≤ · · · ≤ βnn . ♠
Theorem 6.A.8 Let g(x) be a non-decreasing and non-negative function for x ∈
|)}
(0, ∞). If E{g(|X
g(ε)
is defined, then

E{g(|X |)}
P(|X | ≥ ε) ≤ (6.A.24)
g(ε)

for ε > 0, which is called the generalized Bienayme-Chebyshev inequality.



Proof Let the cdf of X be F(x). Then, we get E{g(|X |)} ≥ g(ε) |x|≥ε d F(x) =
 
g(ε)P(|X | ≥ ε) by recollecting E{g(|X |)} = |x|<ε g(|x|)d F(x) + |x|≥ε g(|x|)

d F(x) ≥ |x|≥ε g(|x|)d F(x). ♠
Letting g(x) = x r in the generalized Bienayme-Chebyshev inequality, we can
easily get the Bienayme-Chebyshev inequality discussed below. In addition, the
Chebyshev inequality discussed in Theorem 6.A.4 is a special case of the generalized
Bienayme-Chebyshev inequality and of the Bienayme-Chebyshev inequality.
Theorem 6.A.9 When the r -th absolute moment E {|X |r } of X is finite, where r > 0,
we have

E{|X |r }
P(|X | ≥ ε) ≤ (6.A.25)
εr
for ε > 0, which is called the Bienayme-Chebyshev inequality.
(B) Inequalities of Random Vectors
Theorem 6.A.10 (Rohatgi and Saleh 2001) For two random variables X and Y ,
we have
   
E2 {X Y } ≤ E X 2 E Y 2 , (6.A.26)

which is called the Cauchy-Schwarz inequality.


   
Proof First, note that E{|X Y |} exists when E X 2 < ∞ and E Y 2 < ∞ because
 
|ab| ≤ a +b
2 2

2
for real numbers a and b. Now, if E X 2 = 0, then P(X = 0) = 1 and
452 6 Convergence of Random Variables
 
thus E{X Y } = 0, implying2that
 (6.A.26)  holds true. Next when
 E X 2 > 0, rec-
ollecting that E (α X + Y ) = α 2 E X 2 + 2αE{X Y } + E Y 2 ≥ 0 for any real
 
number α, we have EE{{XX 2Y}} − 2 EE{{XX 2Y}} + E Y 2 ≥ 0 by letting α = − EE{X Y}
2 2

{ X 2 } . This
inequality is equivalent to (6.A.26). ♠
Theorem 6.A.11 (Rohatgi and Saleh 2001) For zero-mean independent random
 n 
k
variables {X i }i=1
n
with variances σi2 i=1 , let Sk = X j . Then,
j=1


n
σ2
P max |Sk | > ε ≤ i
(6.A.27)
1≤k≤n
i=1
ε2

for ε > 0, which is called the Kolmogorov inequality.



 
Proof Let A0 = , Ak = max  S j  ≤ ε for k = 1, 2, . . . , n, and Bk =
1≤ j≤k
Ak−1 ∩ Ack = {|S1 | ≤ ε, |S2 | ≤ ε, . . . , |Sk−1 | ≤ ε} ∩ {at least one of |S1 | , |S2 | , . . . ,
|Sk | is larger than ε}, i.e.,

Bk = {|S1 | ≤ ε, |S2 | ≤ ε, . . . , |Sk−1 | ≤ ε, |Sk | > ε} . (6.A.28)


n
Then, Acn = ∪ Bk and Bk ⊆ {|Sk−1 | ≤ ε, |Sk | > ε}. Recollecting the indicator
k=1  2  
function K A (x) defined in (2.A.27), we get E Sn K Bk (Sk ) = E {(Sn − Sk )
2 
K Bk (Sk ) + Sk K Bk (Sk ) , i.e.,
 2     2 
E Sn K Bk (Sk ) = E (Sn − Sk )2 K Bk (Sk ) + E Sk K Bk (Sk )
 
+E 2Sk (Sn − Sk ) K Bk (Sk ) . (6.A.29)

Noting that Sn − Sk = X k+1 + X k+2 + · · · + Xn and Sk K Bk (Sk ) are independent


{X } |Sk | ≥ ε under
 E k =0, that E K Bk (Sk ) = P (Bk), and that
of each other, that
 2 
2
Bk , we have E Sn K Bk (Sk ) = E (Sn − Sk ) K Bk (Sk ) + E Sk K Bk (Sk )
2

 2 
E Sk K Bk (Sk ) , i.e.,
 2 
E Sn K Bk (Sk ) ≥ ε2 P (Bk ) (6.A.30)


n  2   
from (6.A.29). Subsequently, using E Sn K Bk (Sk ) = E Sn2 K Acn (Sn )
k=1
  
n 
n 
n  
≤E Sn2 = σk2 and (6.A.30), we get σk2 ≥ ε2 P (Bk ) = ε2 P Acn , which
k=1 k=1 k=1
is the same as (6.A.27). ♠
Appendices 453

Example 6.A.7 (Rohatgi and Saleh 2001) The Chebyshev inequality (6.A.16) with
E{Y } = 0, i.e.,

Var{Y }
P (|Y | > ε) ≤ (6.A.31)
ε2
is the same as the Kolmogorov inequality (6.A.27) with n = 1. ♦

Theorem 6.A.12 Consider i.i.d. random variables {X i }i=1


n
with marginal mgf
 tX  n
M(t) = E e i . Let Yn = X i and g(t) = ln M(t). If we let the solution to
i=1
α = ng  (t) be tr for a real number α, then
  
P (Yn ≥ α) ≤ exp −n tr g  (tr ) − g (tr ) , tr ≥ 0 (6.A.32)

and
  
P (Yn ≤ α) ≤ exp −n tr g  (tr ) − g (tr ) , tr ≤ 0. (6.A.33)

The inequalities (6.A.32) and (6.A.33) are called the Chernoff bounds.

When tr = 0, the right-hand sides of the two inequalities (6.A.32) and (6.A.33) are
both 1 from g (tr ) = ln M (tr ) = ln M(0) = 0: in other words, the Chernoff bounds
simply say that the probability is no larger than 1 when tr = 0, and thus the Chernoff
bounds are more useful when tr = 0.

Example 6.A.8 (Thomas


 2 1986) Let X ∼ N (0, 1), n = 1, and Y1 = X . From the
mgf M(t) = exp t2 , we get g(t) = ln M(t) = t2 and g  (t) = t. Thus, the solution
2

to α = ng  (t) = t is tr = α. In other words, the Chernoff bounds can be written as

α2
P(X ≥ α) ≤ exp − , α≥0 (6.A.34)
2

and

α2
P(X ≤ α) ≤ exp − , α≤0 (6.A.35)
2

for X ∼ N (0, 1).


Example 6.A.9 For X ∼ P(λ), assume n = 1 and Y1 = X . From the mgf M(t) =
exp{λ(et − 1)}, we get g(t)=ln M(t) = λ(et − 1) and g  (t) = λet . Solving α =
ng  (t) = λet , we get tr = ln αλ . Thus, tr > 0 when α > λ, tr = 0 when α = λ, and
tr < 0 when α < λ. Therefore, we have
454 6 Convergence of Random Variables

α

P(X ≥ α) ≤ e−λ , α≥λ (6.A.36)
α

and
α

P(X ≤ α) ≤ e−λ , α≤λ (6.A.37)
α
   α α 
because n tr g  (tr ) − g (tr ) = ln αλ − α + λ from g (tr ) = λ −1 =α
  λ
− λ and tr g  (tr ) = α ln αλ .
Theorem 6.A.13 If p and q are both larger than 1 and 1
p
+ 1
q
= 1, then

1   1  
E{|X Y |} ≤ E p  X p  E q Y q  , (6.A.38)

which is called the Hölder inequality.


Theorem 6.A.14 If p > 1, then
1   1   1  
E p |X + Y | p ≤ E p  X p  + E p Y p  , (6.A.39)

which is called the Minkowski inequality.


It is easy to see that the Minkowski inequality is a generalization of the triangle
inequality |a − b| ≤ |a − c| + |c − b|.

Exercises

Exercise 6.1 For the sample space [0, 1], consider a sequence of random variables
defined by


1, ω ≤ n1 ,
X n (ω) = (6.E.1)
0, ω > n1

and let X (ω) = 0 for ω ∈ [0, 1]. Assume the probability measure P(a ≤ ω ≤ b) =
b − a, the Lebesgue measure mentioned following (2.5.24), for 0 ≤ a ≤ b ≤ 1. Dis-
cuss if {X n (ω)}∞
n=1 converges to X (ω) surely or almost surely.

Exercise 6.2 For the sample space [0, 1], consider the sequence

⎨ 3, 0 ≤ ω < 2n
1
,
X n (ω) = 4, 1 − 2n < ω ≤ 1,
1
(6.E.2)
⎩ 1
5, 2n < ω < 1 − 2n
1
Exercises 455

and let X (ω) = 5 for ω ∈ [0, 1]. Assuming the probability measure P(a ≤ ω ≤ b) =
b − a for 0 ≤ a ≤ b ≤ 1, discuss if {X n (ω)}∞
n=1 converges to X (ω) surely or almost
surely.
Exercise 6.3 When {X i }i=1
n
are independent random variables, obtain the distribu-

n
tion of Sn = X i in each of the following five cases of the distribution of X i .
i=1
(1) geometric distribution with parameter α,
(2) NB (ri , α),
(3) P (λi ),
(4) G (αi , β), and
(5) C (μi , θi ).
Sn
Exercise 6.4 To what does n
converge in Example 6.2.10?
X√−λ
Exercise 6.5 Let Y = for a Poisson random variable X ∼ P(λ). Noting that
λ   
the mgf of X is M X (t) = exp λ et − 1 , show that Y converges to a standard
normal random variable as λ → ∞.
Exercise 6.6 For a sequence {X n }∞
n=1 with the pmf
1
, x = 1,
P (X n = x) = n (6.E.3)
1 − n1 , x = 0,

l
show that X n → X , where X has the distribution P(X = 0) = 1.
Exercise 6.7 Discuss if the weak law of large numbers holds true for a sequence of
2+α u(x − 1), where α > 0.
i.i.d. random variables with marginal pdf f (x) = x1+α

n
Exercise 6.8 Show that Sn = X k converges to a Poisson random variable with
k=1
distribution P(np) when n → ∞ for an i.i.d. sequence {X n }∞
n=1 with marginal dis-
tribution b(1, p).

Exercise 6.9 Discuss the central limit theorem for an i.i.d. sequence {X i }i=1 with
marginal distribution B(α, β).
Exercise 6.10 An i.i.d. sequence {X i }i=1
n
has marginal distribution P(λ). When
n
n is large enough, we can approximate as Sn = X k ∼ N (nλ, nλ). Using the
k=1
continuity correction, obtain the probability P (50 < Sn ≤ 80).
Exercise 6.11 Consider an i.i.d. Bernoulli sequence {X i }i=1n
with P (X i = 1)
= p, a binomial random variable M ∼ b(n, p) which is independent of {X i }i=1 n
,
n
and K = X i . Note that K is the number of successes in n i.i.d. Bernoulli trials.
i=1

K 
M
Obtain the expected values of U = X i and V = Xi .
i=1 i=1
456 6 Convergence of Random Variables

Exercise 6.12 The result of a game is independent of another game, and the prob-
abilities of winning and losing are each 21 . Assume there is no tie. When a person
wins, the person gets 2 points and then continues. On the other hand, if the person
loses a round, the person gets 0 points and stops. Obtain the mgf, expected value,
and variance of the score Y that the person may get from the games.

Exercise 6.13 Let Pn be the probability that we have more head than tail in a toss
of n fair coins.
(1) Obtain P3 , P4 , and P5 .
(2) Obtain the limit lim Pn .
n→∞

Exercise 6.14 For an i.i.d. sequence {X n ∼ N (0, 1)}∞ n=1 , let the cdf of X n =

n
1
n
X i be Fn . Obtain lim Fn (x) and discuss whether the limit is a cdf or not.
i=1 n→∞

Exercise 6.15 Consider X [1] = min (X 1 , X 2 , . . . , X n ) for an i.i.d. sequence


{X n ∼ U (0, θ )}∞
n=1 . Does Yn = n X [1] converge in distribution? If yes, obtain the
limit cdf.

Exercise 6.16 The marginal cdf F of an i.i.d. sequence {X n }∞


n=1 is absolutely contin-

uous. For the sequence {Yn }∞
n=1 = {n {1 − F (Mn )}}n=1 , obtain the limit lim FYn (y)
n→∞
of the cdf FYn of Yn , where Mn = max (X 1 , X 2 , . . . , X n ).

Exercise 6.17 Is the sequence of cdf’s



⎨ 0, x < 0,
Fn (x) = 1 − n1 , 0 ≤ x < n, (6.E.4)

1, x ≥n

convergent? If yes, obtain the limit.

 {Y2 
i = X + Wi }i=1 , X and {Wi }i=1 are independent
n n
Exercise 6.18 Inthe sequence
n
of each other, and Wi ∼ N 0, σi i=1 is an i.i.d. sequence, where σi2 ≤ σmax
2
< ∞.
We estimate X via

1
n
X̂ n = Yi (6.E.5)
n i=1

and let the error be εn = X̂ n − X .

 of X̂ n in terms of those of X and {Wi }i=1 .


n
(1) Express the cf, mean, and variance

(2) Obtain the covariance Cov Yi , Y j .
(3) Obtain the pdf f εn (α) and the conditional pdf f X̂ |X (α|β).
(4) Does X̂ n converge to X ? If yes, what is the type of the convergence? If not, what
is the reason?
Exercises 457


Exercise 6.19 Assume an i.i.d. sequence {X i }i=1 with marginal pdf f (x) =
−x+θ
e u(x − θ ). Show that
p
min (X 1 , X 2 , . . . , X n ) → θ (6.E.6)

and that
p
Y → 1+θ (6.E.7)


n
for Y = 1
n
Xi .
i=1

p
Exercise 6.20 Show that max (X 1 , X 2 , . . . , X n ) −→ θ for an i.i.d. sequence

{X i }i=1 with marginal distribution U (0, θ ).

Exercise 6.21 Assume an i.i.d. sequence {X n }∞


n=1 with marginal cdf

⎨ 0, x ≤ 0,
F(x) = x, 0 < x ≤ 1, (6.E.8)

1, x > 1.

Let {Yn }∞ ∞
n=1 and {Z n }n=1 be defined by Yn = max (X 1 , X 2 , . . . , X n ) and Z n =

n (1 − Yn ). Show that the sequence
  n }n=1 converges in distribution to a random
{Z
−z
variable Z with cdf F(z) = 1 − e u(z).

Exercise 6.22 For the sample space  = {1, 2, . . .} and probability measure P(n) =
α
n2
, assume a sequence {X n }∞
n=1 such that


n, ω = n,
X n (ω) = (6.E.9)
0, ω = n.

Show that, as n → ∞, {X n }∞n=1 converges to X = 0 almost


 surely, but does not
converge to X = 0 in the mean square, i.e., E (X n − 0)2  0.

Exercise 6.23 The second moment of an i.i.d. sequence {X i }i=1 is finite. Show that
p 
n
Yn → E {X 1 } for Yn = n(n+1)
2
i Xi .
i=1

Exercise 6.24 For a sequence {X n }∞


n=1 with

1 − n1 , x = 0,
P (X n = x) = (6.E.10)
1
n
, x = 1,

r =2  
we have X n −→ 0 because lim E X n2 = lim 1
= 0. Show that the sequence
n→∞ n→∞ n
{X n }∞
n=1 does not converge almost surely.
458 6 Convergence of Random Variables

Exercise 6.25 Consider a sequence {X i }i=1


n
with a finite common variance σ 2 . When
the correlation coefficient between X i and X j is negative for every i = j, show
that the sequence {X i }i=1
n
follows the weak law of large numbers. (Hint. Assume
n

Yn = n1 (X k − m k ) for a sequence {X i }i=1 with mean E {X i } = m i . Then, it is
k=1

known that a necessary and sufficient condition for {X i }i=1 to satisfy the weak law
of large numbers is that

Yn2
E → 0 (6.E.11)
1 + Yn2

as n → ∞.)

Exercise 6.26 For an i.i.d. sequence {X i }i=1


n
, let E {X i } = μ, Var {X i } = σ 2 ,
 4 l
and E X i < ∞. Find the constants an and bn such that Vnb−a n
n
→ Z for Vn =
n
(X k − μ)2 , where Z ∼ N (0, 1).
k=1

Exercise 6.27 When the sequence {X k }∞ α


k=1 with P (X k = ±k ) =
1
2
satisfies the
strong law of large numbers, obtain the range of α.

Exercise 6.28 Assume a Cauchy random variable X with pdf f X (x) = a


π(x 2 +a 2 )
.
(1) Show that the cf is

ϕ X (w) = e−a|w| . (6.E.12)

(2) Show that the sample mean of n i.i.d. Cauchy random variables is a Cauchy
random variable.

100
Exercise 6.29 Assume an i.i.d. sequence {X i ∼ P(0.02)}i=1
100
. For S = X i , obtain
i=1
the value P(S ≥ 3) using the central limit theorem and compare it with the exact
value.

Exercise 6.30 Consider the sequence of cdf’s




⎨ 0, x ≤ 0,
Fn (x) = x 1 − sin(2nπ x)
, 0 < x ≤ 1, (6.E.13)


2nπ x
1, x ≥ 1,

among which four are shown in Fig. 6.3. Obtain lim Fn (x) and discuss if
n→∞
d 
d
dx
lim Fn (x) is the same as lim F
dx n
(x) .
n→∞ n→∞
Exercises 459

F1 (x) F4 (x) F16 (x) F64 (x)


1 1 1 1

0 1 x 0 1 x 0 1 x 0 1 x
sin(2nπ x)
Fig. 6.3 Four cdf’s Fn (x) = x 1 − 2nπ x for n = 1, 4, 16, and 64 on x ∈ [0, 1]

 ∞
Exercise 6.31 Assume an i.i.d. sequence X i ∼ χ 2 (1) i=1 . Then, we have
Sn ∼ χ 2 (n), E {Sn } = n, and Var {Sn } = 2n. Thus, letting Z n = √12n (Sn − n) =
0 n  Sn   tZ   0n  − n2
2 n
− 1 , the mgf M n (t) = E e n
= exp −t 2
1 − √2t
2n
of Z n can be
obtained as
 * ! * * !− n2
2 2 2
Mn (t) = exp t −t exp t (6.E.14)
n n n
0n
for t < 2
. In addition, from Taylor approximation, we get

* ! * * !2 * !3
2 2 t2 2 1 2
exp t = 1+t + + t exp (θn ) (6.E.15)
n n 2 n 6 n
)
l
for 0 < θn < t 2
n
. Show that Z n → Z ∼ N (0, 1).

Exercise 6.32 In a soccer game, the number N of shootings of a player is a Poisson


random variable with mean μ = 12. The probability of a goal for a shooting is 18 and
is independent of N . Obtain the distribution, mean, and variance of the number of
goals.

Exercise 6.33 For a facsimile (fax), the number W of pages sent is a geometric ran-
k−1
dom variable with pmf pW (k) = 34k for k ∈ {1, 2, . . .} and mean β1 = 4. The amount
Bi of information contained in the i-th page is a geometric random variable with pmf
 k−1
p B (k) = 1015 1 − 10−5 for k ∈ {1, 2, . . .} with expected value α1 = 105 . Assum-
∞ ∞
ing that {Bi }i=1 is an i.i.d. sequence and that W and {Bi }i=1 are independent of each
other, obtain the distribution of the total amount of information sent via this fax.

Exercise 6.34 Consider a sequence {X i }i=1 of i.i.d. exponential random variables
with mean λ . A geometric random variable N is of mean 1p and is independent of
1

∞ N
{X i }i=1 . Obtain the expected value and variance of the random sum S N = Xi .
i=1
460 6 Convergence of Random Variables

Exercise 6.35 Depending on the weather, the number N of icicles has the pmf

p N (n) = 10
1 2−|3−n|
2 for n = 1, 2, . . . , 5 and the lengths {L i }i=1 of icicles are i.i.d.
−λv ∞
with marginal pdf f L (v) = λe u(v). In addition, N and {L i }i=1 are independent
of each other. Obtain the expected value of the sum T of the lengths of the icicles.

Exercise 6.36 Check if the following sequences of cdf’s are convergent, and if yes,
obtain the limit:
(1) sequence {Fn (x)}∞
n=1 with cdf

⎨ 0, x < −n,
Fn (x) = 1
(x + n), −n ≤ x < n, (6.E.16)
⎩ 2n
1, x ≥ n.

(2) sequence {Fn (x)}∞


n=1 such that Fn (x) = F(x + n) for a continuous cdf F(x).
(3) sequence {G n (x)}∞
n=1 such that G n (x) = F (x + (−1) n) for a continuous cdf
n

F(x).

Exercise 6.37 For the sequence {X n ∼ G(n, β)}∞


n=1 , obtain the limit distribution of
Yn = Xn 2n .

References

E.F. Beckenbach, R. Bellam, Inequalities (Springer, Berlin, 1965)


W.B. Davenport Jr., Probability and Random Processes (McGraw-Hill, New York, 1970)
J.L. Doob, Heuristic approach to the Kolmogorov-Smirnov theorems. Ann. Math. Stat. 20(3), 393–
403 (1949)
W. Feller, An Introduction to Probability Theory and Its Applications, 3rd edn. revised printing
(Wiley, New York, 1970)
W.A. Gardner, Introduction to Random Processes with Applications to Signals and Systems, 2nd
edn. (McGraw-Hill, New York, 1990)
R.M. Gray, L.D. Davisson, An Introduction to Statistical Signal Processing (Cambridge University
Press, Cambridge, 2010)
G. Grimmett, D. Stirzaker, Probability and Random Processes (Oxford University, London, 1982)
A. Leon-Garcia, Probability, Statistics, and Random Processes for Electrical Engineering, 3rd edn.
(Prentice Hall, New York, 2008)
G.A. Mihram, A cautionary note regarding invocation of the central limit theorem. Am. Stat. 23(5),
38 (1969)
V.K. Rohatgi, A.KMd.E. Saleh, An Introduction to Probability and Statistics, 2nd edn. (Wiley, New
York, 2001)
J.M. Stoyanov, Counterexamples in Probability, 3rd edn. (Dover, New York, 2013)
J.B. Thomas, Introduction to Probability (Springer, New York, 1986)
R.D. Yates, D.J. Goodman, Probability and Stochastic Processes (Wiley, New York, 1999)
Answers to Selected Exercises

Chapter 1 Preliminaries

Exercise 1.3 A − B = A(A ∩ B). A ∪ B = AB(A ∩ B).


Exercise 1.6 The set of polynomials with integer coefficients is countable.
Exercise 1.7 The set of algebraic numbers is countable.
Exercise 1.9 The collection of all non-overlapping open intervals with real end points
is countable.
Exercise 1.10 (1) f (n, m) = 2n 3m . (2) f (x) = 21 + π1 arctan(x).
(3) Let 0.α1 α2 · · · be the binary expression of the real number 21 + π1 arctan(x). Then,
1
2
+ π1 arctan(x) → (α1 , α2 , . . .), αi ∈ {0, 1}.
(4) infinite sequence 0 , 1 , . . . → ternary number 0. (20 ) (21 ) · · · , where i ∈
{0, 1}.
(5) sequence n 0 , n 1 , . . . of natural numbers → sequence n 0 + 1 0 s, 1, n 0 + 1 0 s, 1, · · · .
( j) ( j) ( j) ( j)
(6) Let x j = ± · · · α−2 α−1 .α1 α2 · · · be the binary representation of a real number
( j)
x j . Let α0 = 1 and 0 when x j > 0 and < 0, respectively.
sequence S = (x0 , x1 , . . .) → sequence α0(0) , α−1 (0)
, α0(1) , α1(0) , α−2
(0) (1)
, α−1 , α0(2) ,
α1(1) , α2(0) , α−3(0)
, α−2 (1)
, α−1(2)
, α0(3) , α1(2) , α2(1) , α3(0) , . . ..
Exercise 1.11 (1) f (n) = (k, l), where n = 2k × 3l · · · is the factorization of n in
prime factors.
(2) f (n) = (−1)
k

m+1
l
, where n = 2k × 3l × 5m · · · is the factorization of n in prime
factors.
(3) For an element x = 0.α1 α2 · · · of the Cantor set, let f (x) = 0. α21 α22 · · · .
(4) a sequence (α1 , α2 , . . .) of 0 and 1 → the number 0.α1 α2 · · · .
Exercise 1.12 (1) When two intervals (a, b) and (c, d) are both finite, f (x) =
c + (d − c) b−a x−a
.
When a is finite, b = ∞, and (c, d) is finite, f (x) = c + π2 (d − c)arctan(x − a).
Similarly in other cases.
(2) S1 → S2 , where S2 is an infinite sequence of 0 and 1 obtained by replacing 1 with
(1, 0) and 2 with (1, 1) in an infinite sequence S1 = (a0 , a1 , . . .) of 0, 1, and 2.

© The Editor(s) (if applicable) and The Author(s), under exclusive license 461
to Springer Nature Switzerland AG 2022
I. Song et al., Probability and Random Variables: Theory and Applications,
https://doi.org/10.1007/978-3-030-97679-8
462 Answers to Selected Exercises

(3) Denote a number x ∈ [0, 1) by x = 0.a1 a2 · · · in decimal system. Let us make


consecutive 9’s and the immediately following non-9 one digit into a group and
each other digit into a group. For example, we have x = 0.(1)(2)(97)(9996)(6)(5)
(99997)(93) · · · . Write the number as x = 0. (x1 ) (x2 ) (x3 ) · · · . Then, letting y =
0. (x1 ) (x3 ) (x5 ) · · · and z = 0. (x2 ) (x4 ) (x6 ) · · · , f (x) = (y, z) is the desired one-
to-one correspondence.
Exercise 1.14 from (m, n) to k: k = g(m, n) = m + 21 (m + n)(m + n + 1).
from k to (m, n): m = k − 21 a(a + 1). n = a − m, where a is an integer such that
a(a + 1) ≤ 2k < (a + 1)(a + 2).
Exercise 1.15 The collection of intervals with rational end points in the space R of
real numbers is countable.
Exercise 1.17 It is a rational number. It is a rational number. It is not always a rational
number.
Exercise 1.18 (1) a+b 2
. (2) a+b2
. (3) a + b−a
√ . (4) c = 2a+b and/or d = a+2b .
2 3 3
(5) Assume a = 0.a1 a2 · · · and b = 0.b1 b2 · · · , where ai ∈ {0, 1, . . . , 9} and bi ∈
{0, 1, . . . , 9}. Let k = arg min {bi > ai } and l = arg min {bi > 0}. Then, the number
i  i>k 
c = 0.c1 c2 · · · ck · · · cl−1 cl = 0.b1 b2 · · · bk · · · bl−1 bl − 1 .
(6) The number c that can be found by the procedure, after replacing a with g, in (5),
where g = a+b 2
.
Exercise 1.19 Let A = {ai }i=0 ∞
and ai = 0.α1(i) α2(i) α3(i) · · · . Assume the second player
( j) ( j)
chooses y j = 4 when α2 j = 4 and y j = 6 when α2 j = 4. Then, for any sequence
x1 , x2 , . . . of numbers the first player has chosen, the second player wins because the
number 0.x1 y1 x2 y2 · · · is not the same as any number a j and thus is not included in
A.    
Exercise 1.25 (1) u(ax + b) = u x + ab u(a) + u −x − ab u(−a).


u(sin x) = {u(x − 2nπ) − u(x − (2n + 1)π)}.
n=−∞
0, x < ln π,
u (e x − π) = = u(x − ln π).
1, x > ln π
x
(2) −∞ u(t − y)dt = (x − y)u(x − y).
Exercise 1.27 δ  (x) cos x = δ  (x).
Exercise 1.28 −2π eπx δ x 2 − π 2 d x = coshπ π . δ(sin x) = δ(x) + δ(x − π).
2π 2

∞  
Exercise 1.29 −∞ (cos x + sin x)δ  x 3 + x 2 + x d x = 1.
Exercise
 1.31 
∞ ∞
(1) 1 + n1 , 2 n=1 → (1, 2). (2) 1 + n1 , 2 n=1 → (1, 2].
 ∞  ∞
(3) 1, 1 + n1 n=1 → (1, 1] = ∅. (4) 1, 1 + n1 n=1 → [1, 1] = {1}.
  ∞  ∞
(5) 1 − n1 , 2 n=1 → [1, 2). (6) 1 − n1 , 2 n=1 → [1, 2].
  ∞   ∞
(7) 1, 2 − n1 n=1 → (1, 2). (8) 1, 2 − n1 n=1 → [1, 2).
1 1
Exercise 1.32 0 lim f n (x)d x = 0. lim 0 f n (x)d x = 21 .
 1 n→∞ n→∞
b
Exercise 1.33 0 lim f n (x)d x = 0. lim 0 f n (x)d x = ∞.
n→∞ n→∞
Answers to Selected Exercises 463

Exercise 1.34 The number of all possible arrangements with ten distinct red balls
and ten distinct black balls = 20! ≈ 2.43 × 1018 .
Exercise 1.41 When p > 0, p C0 − p C1 + p C2 − p C3 + · · · = 0.
When p > 0, p C0 + p C1 + p C2 + p C3 + · · · = 2 p .

∞ 

When p > 0, p C2k+1 = p C2k = 2
p−1
.
k=0 k=0
1 2 3
Exercise 1.42 (1 + z) 2 = 1 + 2z − z8 + 16 z
− . . ..

1 − 1
z + 3 2
z − 5 3
z + 35 4
z − ..., |z| < 1,
(1 + z)− 2 =
1
2 8 16 128
− 21 1 − 23 3 − 25 5 − 27 35 − 29
z − 2 z + 8 z − 16 z + 128 z − . . . , |z| > 1.

Chapter 2 Fundamentals of Probability

Exercise 2.1 F (C) = {∅, {a}, {b}, {a, b}, {c, d}, {b, c, d}, {a, c, d}, S}.
Exercise 2.2 σ (C) = {∅, {a}, {b}, {a, b}, {c, d}, {b, c, d}, {a, c, d}, S}.
Exercise 2.3 (1) Denoting the lifetime of the battery by t, S = {t : 0 ≤ t < ∞}.
(2) S = {(n, m) : (0, 0), (1, 0), (1, 1), (2, 0), (2, 1), (2, 2)}.
(3) S = {(1, red), (2, red), (3, green), (4, green), (5, blue)}.
Exercise 2.4 P ( AB c + B Ac ) = 0 when P(A) = P(B) = P(AB).
Exercise 2.5 P(A ∪ B) = 21 . P(A ∪ C) = 23 . P(A ∪ B ∪ C) = 1.
Exercise 2.6 (1) C = Ac ∩ B.
Exercise 2.7 The probability that red balls and black balls are placed in an alternating
fashion = 2×10!×10!
20!
≈ 1.08 × 10−5 .
Exercise 2.8 P(two nodes are disconnected) = p 2 (2 − p).
Exercise 2.12 Buying 50 tickets in one week brings us a higher probability of getting
the winning ticket than buying one ticket over 50 weeks.
 99 50
1
2
versus 1 − 100 ≈ 0.395
Exercise 2.13 16 .
Exercise 2.14 21 .
   k
Exercise 2.15 P C ∩ (A − B)c = ∅ = 58 .
   n−1 1  1 n−1 1
Exercise 2.16 pn,A = − 41 − 13 + 4 . pn,B = 12
1
− + .
 1 9   1310  4
p10,A = 4 1 − − 3
1
≈ 0.250013. p10,B = 4 1 − − 3
1
≈ 0.249996.
Exercise
 2.17 (1), (2) probability of no match
0, N = 1,
= 1 (−1) N
2!
− 3! + · · · + N ! , N = 2, 3, . . . .
1

(3)⎧probability of k matches 
⎪ (−1) N −k
⎨ k!1 2!1 − 3!1 + · · · + (N −k)! , k = 0, 1, . . . , N − 2,
= 0, k = N − 1,

⎩1
k!
, k = N.
464 Answers to Selected Exercises

Exercise 2.18 43 .
Exercise 2.19 (1) α = p 2 . P((k, m) : k ≥ m) = 2−1 p .
(2) P((k, m) : k + m = r ) = p 2 (1 − p)r −2 (r − 1).
(3) P((k, m) : k is an odd number) = 2−1 p .
Exercise 2.20 P(A ∩ B) = 10 3
. P(A|B) = 35 . P(B|A) = 37 .
Exercise 2.21 Probability that only two will hit the target = 1000 398
.
Exercise 2.22 (4) P (B | A) = 1 − p or
c 1−s−q+qr
r
. P (B |A c
) = 1 − q or s−r
1−r
p
.
(1−q)(1−r ) (1− p)r
P (A | B) =
c
s
or s . P ( A |B ) = 1−s or 1−s .
s−r p c 1−s−q+qr

(5) S = {A defective element is identified to be defective, A defective element is


identified to be functional, A functional element is identified to be defective, A
functional element is identified to be functional}.  
Exercise 2.25 (1) Ai and A j are independent if P ( A  i ) = 0 or P A j = 0.
Ai and A j are not independent if P ( Ai ) = 0 and P A j = 0.
(2) partition of B = {B A1 , B A2 , . . . , B An }.
Exercise 2.26 P(red ball) = 25 × 21 + 15 × 21 = 10 3
.
Exercise 2.27 P(red ball) = 3n . 2n+1
 
n k−1 n
Exercise 2.28 pn,k = P(ends in k trials) = 1 − 3n−1 3n−1
.
Exercise 2.29 Probability of Candidate A leading always = n−m .
 1 2n −1 n+m
Exercise 2.30 β0 = 4 .
n−k
Exercise 2.31 P(A) = n Ck 23n .
Exercise 2.32 (1) p10 = 1 − p11 . p01 = 1 − p00 .
(2) P(error) = (1 − p00 ) (1 − p) + (1 − p11 ) p.
(3) P(Y = 1) = pp11 + (1 − p) (1 − p00 ). P(Y = 0) = p (1 − p11 ) +
(1 − p) p00 . ⎧ pp11

⎪ pp11 +(1− p)(1− p00 )
, j = 1, k = 1,

⎨ p(1− p11 )
p11 )+(1− p) p00
, j = 1, k = 0,
(4) P ( X = j| Y = k) = p(1− (1− p)(1− p00 )
⎪ pp11 +(1− p)(1− p00 ) j = 0, k = 1,
⎪ ,

⎩ (1− p) p00
p(1− p11 )+(1− p) p00
, j = 0, k = 0.
(5) P (Y = 1) = P (Y = 0) = 2 . 1

P ( X = 1| Y = 0) = P ( X = 0| Y = 1) = 1 − p11 .
P ( X = 1| Y = 1) = P ( X = 0| Y = 0) = p11 .
Exercise 2.33 (1) α1,1 = m(m−1) n(n−1)
. α0,1 = m(n−m)
n(n−1)
. α1,0 = m(n−m)
n(n−1)
.
(n−m)(n−m−1)
α0,0 = n(n−1)
.
. α̃0,0 = (n−m)
2 2
(2) α̃1,1 = mn 2 . α̃0,1 = m(n−m)
n2
. α̃1,0 = m(n−m)
n2 n2
.
(n−m)(n−m−1)
(3) β0 = n(n−1)
. β1 = n(n−1) . β2 = n(n−1) .
2m(n−m) m(m−1)

Exercise 2.34 (1) P2 = 280 15


. P0 = 280143
. P1 = 122
280
. (2) P3 = 15 8
.
(1− p)( pb) r
Exercise 2.35 (1) P(r brown eye children) = (1− p+ pb) r +1 .
r
(2) P(r boys) = 2(1− p) p
(2− p)r +1
.
(3) P(at least two boys|at least one boy) = p
2− p
.
Answers to Selected Exercises 465

k N
q q

Exercise 2.37 pk =
p p
q
N .
1− p

Exercise 2.38 (1) p1 = 10 3


. (2) p2 = 10
7
. (3) P(15 red, 10 white|red flower) = 21 .
Exercise 2.39 c = 15 .
Exercise 2.41 (2) 3c11 = 2c12 + c13 .
Exercise 2.42 (1) B1 and R are independent of each other. B1 and G are independent
of each other.
(2) B2 and R are not independent of each other. B3 and G are not independent of
each other.
Exercise 2.43 A1 , A2 , and A3 are not mutually independent.
Exercise 2.44 A and C are not independent of each other.
60×60−50× 50 ×2
Exercise 2.45 Probability of meeting = 60×60
2
= 11
36
≈ 0.3056.
Exercise 2.46 p1 = 2 . p2 = 3 .
1 1

Exercise 2.47 P (red ball| given condition) = 40 23


= 0.575.
 8  10−8
Exercise 2.48 (2) P(one person wins eight times) = 10 C8 41 1 − 41 = 405
410
−4
≈ 3.8624 × 10 .

10  1 k  10−k
P(one person wins at least eight times) = 10 Ck 4 1 − 41 ≈ 4.1580 × 10−4 .
k=8
Exercise 2.50 Probability that a person playing piano is a man = 21 .
 50  30 15 5
Exercise 2.51 30,15,5 0.5 0.3 0.2 = 30!15!5!
50! 1 315 1
230 1015 55
≈ 3.125 × 10−4 .

Chapter 3 Random Variables



FX (y + c), y ≥ 0,
Exercise 3.2 Fg(X ) (y) =
⎧ FX (y − c), y < 0.
⎨ FX (y − c), y ≥ c,
Exercise 3.3 FY (y) = FX (0) − P(X = 0), −c ≤ y < c,

FX (y + c), y < −c.

Exercise 3.4 Denoting the solutions to y = a sin(x + θ) by {xi }i=1 ,


f Y (y) = √ 2 2
1
f X (xi ).
a −y i=1
Exercise3.5 For G = 0 and B = 0,
 B n−k  G k
C , k = 0, 1, . . . , n,
p X (k) = n k G+B G+B
0, otherwise.

1, k = 0,
For G = 0, p X (k) =
0, otherwise.

1, k = n,
For B = 0, p X (k) =
0, otherwise.
 1
√ , 1 < y ≤ 2; 6√1y−1 , 2 < y ≤ 5;
Exercise 3.6 pdf: f Y (y) = 3 y−1
0, otherwise.
466 Answers to Selected Exercises

(In the pdf, the set {1 < y ≤ 2, 2 < y ≤ 5} can be replaced with {1 < y ≤ 2, 2 <
y < 5}, {1 < y < 2, 2 ≤ y ≤ 5}, {1 < y < 2, 2 ≤ y < 5}, {1 < y < 2, 2 < y ≤
5}, or {1 < y < 2, 2 < y < 5}.) 2√
0, √ y ≤ 1; y − 1, 1 ≤ y ≤ 2;
cdf: FY (y) = 1 1 3
+ y − 1, 2 ≤ y ≤ 5; 1, y ≥ 5.
3 3 ⎧
⎨ 0, y < −18; 1
7
, −18 ≤ y < −2;
Exercise 3.7 cdf FY (y) = 73 , −2 ≤ y < 0; 47 , 0 ≤ y < 2;
⎩6
, 2 ≤ y < 18; 1, y ≥ 18.
 −1 7
Exercise 3.8 E X = n−2
1
, n = 3, 4, . . ..
Exercise 3.9 f Y (y) = FX (y)δ(y) + f X (y)u(y) = FX (0)δ(y) + f X (y)u(y), where
FX is the cdf of X . ⎧

⎨ 0,  √  y ≤ 0,
θ
Exercise 3.10 f Y (y) = e√ y cosh  θ y , 0 < y ≤ θ2 ,
1
⎪ √
⎩ √ exp −θ y , y > 2 .
θ 1
2e y θ
FX (x)−FX (b)
Exercise 3.11 FX |b<X ≤a (x) = FX (a)−FX (b)
.
f X (x)
f X |b<X ≤a (x) = FX (a)−FX (b)
u(x − b)u(a − x)
. Var{X |X > a} = (1−a)
2
Exercise 3.12 E{X |X > a} = 1+a 2 12
.
Exercise 3.13 P(950 ≤ R ≤ 1050) = 2 . 1

Exercise 3.14 Let X be the time to take to the location of the appointment with cdf
FX . Then, departing t ∗ minutes before the appointment time will incur the minimum
cost, where FX (t ∗ ) = k+c k
.
Exercise 3.15 P (X ≤ α) = 13 u(α) + 23 u(α − π). P (2 ≤ X < 4) = 23 .
P (X ≤ 0) = 13 .
   
Exercise 3.16 P (U > 0) = 21 . P |U | < 13 = 13 . P |U | ≥ 43 = 41 .
1 
P 3 < U < 21 = 12 1
.
Exercise 3.17 P(A) = P(A ∪ B) = 32 1
. P(B) = P(A ∩ B) = 1024 1
.
P (B c ) = 1024 .
1023

Exercise 3.18 (1) When L 1 = max (0, w A + w B − N ) and U1 = min (w A , w B ),


( N )( N −d )( N −w A )
for d = L 1 , L 1 + 1, . . . , U1 , P(D = d) = d wNA −d N w B −d .
(w A )(w B )
(2) When L 2 = max (0, k − w B ), U2 = min (w A − d, k − d), L 3 = L 1 and
U3 = min (w A + w B , N ), for k = L 3 , L 3 + 1, . . . , U3
U2  
 N −d  N −w A w A −d 
U1 
P(K = k) = N 1 N N
(w A )(w B ) d=L 1 i=L 2 d w A −d w B −d i
 w B −d  w −k+2i
× k−d−i p B (1 − p)k+w A −2d−2i .
⎧1
2 ⎨ 6 , k = 0,
, d = 0,
(3) P(D = d) = 13 P(K = k) = 23 , k = 1,
, d = 1. ⎩1
3
6
, k = 2.
Exercise 3.19 (1) c = 2. (2) E{X } = 2.
Exercise 3.20 (1) P(0 < X < 1) = 14 . P(1 ≤ X < 1.5) = 38 .
(2) μ = 24 19
. σ 2 = 191
576
.
Answers to Selected Exercises 467
 w−a
0, w < a; , a ≤ w < b;
Exercise 3.21 (1) FW (w) = b−a
1, w ≥ b.
(2) W ∼ U [a, b).   
Exercise 3.22 f Y (y) = 1
(1−y)2
u(y) − u y − 21 .
Exercise 3.23 f Y (y) = 1
(1+y)2
fX y
1+y
.
When X ∼ U [0, 1), f Y (y) = 1
(1+y)2
u(y).
Exercise 3.25 f Z (z) = u(z) − u(z − 1).
Exercise 3.26 f Y (t) = (t+1)2 u(t − 1). f Z (s) = (1−s)2 {u(s + 1) − u(s)}.
2 2

Exercise 3.27 f Y (y) = (2 − 2y)u(y)u(1 − y).


Exercise 3.28 (1) f Y (y) = √ 12 2 u (a − |y|).
π a −y
(2) f Y (y) = √2 u(y)u(1 − y).
π

1−y 2


⎪ 0, y ≤ −1,
⎨4 −1
− 4
cos y, −1 ≤ y ≤ 0,
(3) With 0 ≤ cos−1 y ≤ π, FY (y) = 3 3π −1
⎪ 1 − 3π cos y, 0 ≤ y ≤ 1,

2

1, y ≥ 1.
1 

Exercise 3.29 (1) f Y (y) = 1+y 2 f X (xi ). (2) f Y (y) = π 1+y
1
.
i=1
( 2)
(3) f Y (y) = π 1+y
1
.
( 2)
Exercise 3.30 When X ∼ U [0,1), 
Y = − λ1 ln(1 − X ) ∼ FY (y) = 1 − e−λy u(y).
Exercise 3.31 expected value: E{X } = 3.5. mode: 1, 2, . . . , or 6.
median: any real number in the interval [3, 4].
Exercise 3.32 c = 5 < 101 7
= b.
Exercise 3.33 E{X } = 0. Var{X } = λ22 .
Exercise 3.34 E{X } = α+β α
. Var{X } = (α+β)2αβ (α+β+1)
.
Exercise 3.36 f Y (y) = u(y + 2) − u(y + 1). f Z (z) = 21 {u(z + 4) − u(z + 2)}.
f W (w) = 21 {u(w + 3) − u(w + 1)}.
Exercise 3.37 pY (k) = 41 , k = 3, 4, 5, 6. p Z (k) = 41 , k = −1, 0, 1, 2.
pW (r ) = 41 , r = ± 13 , 0, 15 .
 c
 − c − 0 FX (x)d x, c ≥ 0,
Exercise 3.39 (2) E X c =
c, c < 0.
∞ c
 + {1 − FX (x)}d x + 0 FX (x)d x, c ≥ 0,
(3) E X c = 0∞
0 {1 − FX (x)}d x, c < 0.
Exercise 3.40 (1) E{X } = λμ. Var{X  } = (1 + λ)λμ.
Exercise 3.41 f X (x) = 4πρx 2 exp − 43 πρx 3 u(x).
Exercise 3.42 A = 16 1
. P(X ≤ 6) = 78 .
Exercise 3.44 E{F(X )} = 21 .
Exercise 3.45 M(t) = t+1 1
. ϕ(ω) = 1+1jω . m 1 = −1. m 2 = 2. m 3 = −6. m 4 = 24.
π t
Exercise 3.46 mgf M(t) = π2 02 (tan x) π d x.
Exercise 3.47 α = 2n−1 B̃|β|n , n .
(2 2)
468 Answers to Selected Exercises


Exercise 3.48 M R (t) = 1 + 2πσt exp σ 2t Φ (σt), where Φ is the standard nor-
2 2

mal cdf. 
1 − 4y , 0 ≤ y < 1; 21 − 4y , 1 ≤ y < 2;
Exercise 3.51 f Y (y) =
0, otherwise.

Exercise 3.52 A cdf such that  n    
∞
 1 i 1 n+1
(locaton of jump, height of jump) = a + (b − a) 2
, 2
i=1 n=0
and the interval between
 adjacent jumps are all the same.
Exercise 3.53⎧ (1) a ≥ 0, a + 13 ≤ b ≤ −3a + 1 .
1√
⎨ 0,  √  x < 0; 4 √
x,  0 ≤ x < 1;
(2) FY (x) = 24 1
11 x − 1 , 1 ≤ x < 4; 18 x + 5 , 4 ≤ x < 9;

1, x ≥ 9.
P(Y = 1) = 6 . P(Y = 4) = 0.
1
 5√
Exercise 3.54 FY (x) = 1 √
0,  x < 0; 8
x, 0 ≤ x < 1;
⎧8 x + 4 , 1 ≤ x < 16; 1, x ≥ 16.

⎪ F X (α), y < −2 or y > 2,

FX (−2) + p X (1), −2 ≤ y < 0,
Exercise 3.57 FY (y) =

⎪ FX (−2) + p X (0) + p X (1), 0 ≤ y < 2,

FX (2), y = 2,
where α is the only real root of y = x 3 − 3x when y > 2 or y < −2.
Exercise 3.58 FY (x) = 0 for x < 0, x for 0 ≤ x < 1, and 1 for x ≥ 1.
Exercise 3.59 (1) For α(θ) = 21 , ϕ(ω) = exp − ω4 .
2

For α(θ) = cos2 θ, ϕ(ω) = exp − ω4 ω2


2
I0 4
.
1
(2) The normal pdf with mean 0 and variance 2
.
(4) E {X 1 } = 0. Var {X 1 } = 21 . E {X 2 } = 0. Var {X 2 } = 21 .

Exercise 3.63 E{X } = π2 . E X 2 = π2 − 2.
2

Exercise 3.65 Var{Y } = σ 2X + 4m +X m −X .


Exercise 3.67 pmf p(k) = (1 − α)k α for k ∈ {0, 1, . . .}: E{X } = 1−α α
.
σ 2X = 1−α
α 2 .
pmf p(k) = (1 − α)k−1 α for k ∈ {1, 2, . . .}: E{X } = α1 . σ 2X = 1−α α2
.
βγ αβγ(α+β−γ)
Exercise 3.68 E{X } = α+β . Var{X } = (α+β)2 (α+β−1) .
λn
Exercise 3.70 For t < λ, MY (t) = (λ−t) n . E{Y } = λ . Var{Y } = λ2 .
n n
  2 y
1 − 2p − 2p + 1 , y ≥ 1,
Exercise 3.71 FY (y) = E{Y } = 2 p(1− 1
.
0, y < 1. p)
  
Exercise 3.74 In b 10, 13 , P10 (k) is the largest at k = 11 = 3.
  3
In b 11, 21 , P11 (k) is the largest at k = 5, 6.
Exercise 3.75 (1) P01 = (0.995)1000 + 1000 C1 (0.005)1 (0.995)999 ≈ 0.0401.
approximate value with (3.5.17): Φ(2.2417) − Φ(1.7933) ≈ 0.0242.
approximate value with (3.5.18): Φ(2.4658) − Φ(1.5692) ≈ 0.0515.
approximate value with (3.5.19): 50! + 51! e−5 ≈ 0.0404.
0 1

(2) P456 = 1000 C4 (0.005)


4
(0.995)996 + 1000 C5 (0.005)
5
Answers to Selected Exercises 469

×(0.995)995 + 1000 C6 (0.005)6 (0.995)994 ≈ 0.4982.


approximate value with (3.5.17): 2 {Φ(0.4483) − 0.5} ≈ 0.3472.
approximate value with (3.5.18): 2 {Φ(0.6725) − 0.5} ≈ 0.4988.
approximate value with (3.5.19): 54! + 55! + 56! e−5 ≈ 0.4972.
4 5 6

Exercise 3.77 (2) coefficient of variation = √1 .


λ
(3) skewness = √1 . kurtosis = 3 + λ1 .
λ
√ 
Exercise 3.81 f Y (y) = y
2
2u(y)u(1 − y) + u(y − 1)u 3−y .

Chapter 4 Random Vectors

Exercise 4.2 p X (1) = 35 , p X (2) = 25 . pY (4) = 35 , pY (3) = 25 .


p X,Y (1, 4) = 10 3
, p X,Y (1, 3) = 10 3
, p X,Y (2, 4) = 10 3
, p X,Y (2, 3) = 101
.
pY |X (4|1) = 2 , pY |X (3|1) = 2 , pY |X (4|2) = 4 , pY |X (3|2) = 4 .
1 1 3 1

p X |Y (1|4) = 21 , p X |Y (2|4) = 21 , p X |Y (1|3) = 34 , p X |Y (2|3) = 41 .


p X +Y (4) = 10 3
, p X +Y (5) = 25 , p X +Y (6) = 10 3
.
Exercise 4.3 pairwise independent. not mutually independent.
Exercise 4.4 a = 41 . X and Y are not independent of each other. ρ X Y = 0.
Exercise 4.5 p R|B=3 (0) = 27 8
, p R|B=3 (1) = 49 , p R|B=3 (2) = 29 ,
p R|B=3 (3) = 27 . E{R|B = 1} = 53 .
1

Exercise 4.8 ⎧ √ √
⎪ √1 , 0 < y1 < 1 , − y1 + 1 < y2 < y1 + 21 ,


⎨ √1 , 0 < y < 1 , √ y − 1 < y < −√ y + 1 ,
2 y1 4 2
1 1 2 1
f Y (y1 , y2 ) = y1 4
√ 2 √ 2

⎪ √1 , 0 < y1 < 1 , − y1 − 1 < y2 < y − 1
,

⎩ 2 y1 4 2 1 2
0, otherwise.
 
f Y1 (y) = √1y u(y)u 41 − y . f Y2 (y) = (1 − |y|)u(1 − |y|).


⎪ 0, w < 0,

⎨ π w2 , 0 ≤ w < 1,
Exercise 4.9 FW (w) =
4
π

−1 w 2 −1
√ √

⎪ − sin w
w + w − 1, 1 ≤ w < 2,
2 2


4

⎧π 1, w ≥ 2.
⎪ 2 w,
⎨ 0 ≤ w < 1,
π

−1 w 2 −1

f W (w) = 2 4 − sin w
w, 1 ≤ w < 2,


0, otherwise.

⎨ 1 − e−(μ+λ)w , if w ≥ 0, v ≥ 1,
μ
Exercise 4.10 FW,V (w, v) = μ+λ 1 − e−(μ+λ)w , if w ≥ 0, 0 ≤ v < 1,

0, otherwise.
2
Exercise 4.11 fU (v) = (1+v)4 u(v).
3v
  )    √   √ 
Exercise 4.12 (1) f Y y1 , y2 = 2u(y √ 1 u 1 − y1 u y2 −
y1
y1 u 1 − y2 + y1 .
470 Answers to Selected Exercises

⎨ y, 0 < y ≤ 1,
f Y1 (y) = 2√1 y u(y)u(1 − y). f Y2 (y) = 2 − y, 1 < y ≤ 2,

0, otherwise.
 √   √ 
(2) f Y (y1 , y2 ) = √1y1 u (y1 ) u 1 − y2 + y1 u y2 − 2 y1 .

 1 ⎨ y2 , 0 < y2 ≤ 1,
√ − 1, 0 < y1 ≤ 1,
f Y1 (y1 ) = y1
f Y2 (y2 ) = 2 − y2 , 1 < y2 ≤ 2,
0, otherwise. ⎩
0, otherwise.
Exercise 4.13 (1) f Y (y1 , y2 ) = √ y11+y2 u (y1 + y2 ) u (1 − y1 − y2 ) u (y1 − y2 ) u
(1 − y1 + y2 ).√  √ 
2 2y1 , 0 < y1 ≤ 21 ; 2 1 − 2y1 − 1 , 21 < y1 ≤ 1;
f Y1 (y1 ) =
0, otherwise.
 √  √ 
2 2y2 + 1, − 21 < y2 ≤ 0; 2 1 − 2y2 , 0 < y2 ≤ 21 ;
f Y2 (y2 ) =
0, otherwise.
 √ 
(2) f Y (y1 , y2 ) = √ y12+y2 u (y1 + y2 ) u (1 − y1 + y2 ) u y1 − y2 − y1 + y2 .
⎧ √ 
⎨ 2 √8y1 + 1 − 1 , √ 0 < y1 ≤ 21 ,
f Y1 (y1 ) = 2 8y1 + 1 − 1 − 4 2y1 − 1, 21 < y1 ≤ 1,

⎧ 0,√ otherwise.
⎨ 4 √2y2 + 1, √  − 1
2
< y2 ≤ − 18 ,
f Y2 (y2 ) = 4 2y2 + 1 − 8y2 + 1 , − 8 < y2 ≤ 0, 1

0, otherwise.
z α1 +α2 −1
Exercise 4.14 f Z (z) = β α1 +α2 Γ (α1 +α2 )
exp − βz u(z).
Γ (α1 +α2 ) w α1 −1
f W (w) = Γ (α1 )Γ (α2 ) (1+w)α1 +α2
u(w).
 y 2r1 − r −1 1 − 21
Exercise 4.15 (1) f Y1 (y1 ) = 2r1 u (y1 ) 1 1 y1 r y1r − y22
−y12r
     
1 1
fX y1r − y22 , y2 + f X − y1r − y22 , y2 dy2 .

⎨ 1, w ≥ 1, 
2w, w ∈ [0, 1],
(3) For r = 21 , FW (w) = w 2 , w ∈ [0, 1], f W (w) =
⎩ 0, otherwise.
⎧ 0, otherwise.
⎨ 1, w ≥ 1, 
1, w ∈ [0, 1],
For r = 1, FW (w) = w, w ∈ [0, 1], f W (w) =
⎩ 0, otherwise.
0, otherwise.
 
0, w < 1, 0, w < 1,
For r = −1, FW (w) = f (w) =
1 − w −1 , w ≥ 1. W
w −2 , w > 1.
Exercise
⎧1 4.16 (1) f Y1 ,Y2 (y1 , y2 )
⎪ 2 (y1 − |y2 |) ,
⎪ (y1 , y2 ) ∈ (1 : 3) ∪ (2 : 3),


⎨ 1 − |y2 | , (y1 , y2 ) ∈ (1 : 2) ∪ (2 : 1),
= 21 (3 − y1 − |y2 |) , (y1 , y2 ) ∈ (3 : 2) ∪ (3 : 1),


⎪ 21 ,
⎪ (y1 , y2 ) ∈ (3 : 3),

0, otherwise.
(refer to Fig. A.1).
(2) f Y2 (y) = (1 − |y|)u (1 − |y|).
Answers to Selected Exercises 471

Fig. A.1 The regions for y2


f Y1 ,Y2 (y1 , y2 ) in Exercise
4.16 1
(1 : 2)

(1 : 3) (3 : 2)
1 2 3
(3 : 3)
0 y1
(2 : 3) (3 : 1)

(2 : 1)
−1

⎧1 2

⎪ y , 0 ≤ y ≤ 1,
⎨2 2
−y + 3y − 23 , 1 ≤ y ≤ 2,
(3) f Y1 (y) =
⎪ 2 (3 − y) ,
⎪ 2 ≤ y ≤ 3,
1 2

0, y ≤ 1, y ≥ 3.

Exercise 4.17 (1) p X +Y (v) = (v − 1)(1 − α)2 αv−2 ũ(v − 2).


p X −Y (w) = 1−α
1+α
α|w| ũ(|w|).
(2) p X −Y,X (w, x) = (1 − α)2 α2x−w−2 ũ(x − 1)ũ(x − w − 1).
p X −Y,Y (w, y) = (1 − α)2 αw+2y−2 ũ(w + y − 1)ũ(y − 1).
(3) p X −Y (w) = 1−α1+α
α|w| ũ(|w|). p X (x) = (1 − α)αx−1 ũ(x − 1).
pY (y) = (1 − α)α ũ(y − 1).
y−1
   
(4) p X +Y,X −Y (v, w) = (1 − α)2 αv−2 ũ v+w 2
− 1 ũ v−w
2
−1 .
p X +Y (v) = (v − 1)(1 − α)2 αv−2 ũ(v − 2). p X −Y (w) = 1−α 1+α
α|w| ũ(|w|).
the same as the results in (1). ⎧ 1
 7 n ⎨ 64 , k = 4; 12 1
, k = 3;
Exercise 4.18 E {Rn } = 6 . p2 (k) = 288 , k = 2; 36 , k = 1;
83 17
⎩ 9
64
, k = 0.
η0 = 3 .1

Exercise 4.19 E{X |Y = y} = 2 + y for y ≥ 0.


Exercise 4.20 f Y ( y) = y12 y2 e−y1 u (y1 ) u (y2 ) u (1 − y2 ) u (y3 ) u (1 − y3 ).
  u(y1 +y2 )      
Exercise 4.21⎧ f Y y1 , y2 = 2
u 2 − y1 − y2 u y1 − y2 u 2 − y1 + y2 .
⎨ y1 , 0 < y1 ≤ 1,
f Y1 (y1 ) = 2 − y1 , 1 ≤ y1 < 2,

⎧ 0, otherwise.
⎨ 1 + y2 , −1 < y2 ≤ 0,
f Y2 (y2 ) = 1 − y2 , 0 ≤ y2 < 1,

0, otherwise.
Exercise 4.22 f Y1 ,Y2 (y1 , y2 ) = f X 1 ,X 2 (y1 cos θ − y2 sin θ, y1 sin θ + y2 cos θ).
472 Answers to Selected Exercises

Exercise
⎧ 4.24 FX,Y |A (x, y) =

⎪ 1,   region 1−3,



⎪ 1 − πa1 2 −xψ(x) + a 2 θx − π2 a 2  , region 1−2,



⎪ 1 − πa1 2 −yψ(y) + a 2 θ y − π2 a 2 , region 1−4,



⎪ 1 − πa1 2 a 2 cos−1 ay − yψ(y)




⎨ −xψ(x) + a 2 θx − π2 a 2 , region 1−5,

⎪ 1
xy − a2
θ + y
ψ(y)

⎪ πa 2 2 y 2



⎪ + x2 ψ(x) − a2
cos−1 x
+ πa 2 , region 1−1 or 2−1,

⎪ 2 a



⎪ 1
xy − a2
θx + x2 ψ(x)

⎪ πa 2 2


⎩ + 2y ψ(y) − a2
cos−1 y
+ πa 2 , region 4−1 or 1−1,
⎧ 2 a

⎪ 1
xy − a2
cos−1 y
+ 2y ψ(y)

⎪ πa 2 2 a



⎪ + x2 ψ(x) + a2
θ , region 2−1 or 3−1,

⎪ 2 x


⎨ 12 a2 −1 x
x y − 2 cos a + x2 ψ(x)
πa


2
+ 2y ψ(y) + a2 θ y , region 3−1 or 4−1,

⎪  

⎪ π 2
xψ(x) + a 2 θx − π2 a 2  ,
1

⎪ πa 2
2
region 2−2,




1
yψ(y) + a θ y − 2 a , region 4−2,
⎩ πa 2

0, otherwise.

Here, ψ(t) = a 2 − t 2 , θw = cos−1 −ψ(w) , and ‘region’ is shown in Fig. A.2.
  a
f X,Y |A (x, y) = πa1 2 u a 2 − x 2 − y 2 .

Fig. A.2 The regions of v


FX,Y | A (u, v) in Exercise
4.24
2−2 1−2 1−3
a
1−5

2−3 2−1 1−1 1−4

−a a u
3−1 4−1 4−2

3−2 −a 4−3
Answers to Selected Exercises 473

Exercise 4.28
⎛ A1 linear transformation
⎞ transforming X into an uncorrelated random
√ − √1 0
⎜ 2 2

vector: A = ⎝ √16 √16 − √26 ⎠.
√1 √1 √1
3 3 3
a linear transformation
⎛ 1 transforming
⎞ X into an uncorrelated random vector with unit
√ − √1 0
⎜ 2 2

variance: ⎝ √16 √16 − √26 ⎠.
1
√ 1
√ 1

2 3 2 3 2 3
Exercise 4.29 f Y (y) = exp(−y)u(y).
Exercise 4.30 p X (1) = 58 , p X (2) = 38 . pY (1) = 43 , pY (2) = 41 .
Exercise 4.31 pmf of  M = max (X 1 ,X 2 ):
−2λ λm
m
λk
− λm! ũ(m).
m
P(M = m) = e m!
2 k!
k=0
pmf of N = min (X 1 ,X 2 ): 


P(N = n) = e−2λ λn! 2 λk λn
n

k!
+ n!
ũ(n).
k=n+1
Exercise 4.32 f W (w) = 21 {u(w) − u(w − 2)}. fU (v) = u(v + 1) − u(v).

0, z ≤ −1 or z > 2; z+1 , −1 < z ≤ 0;
f Z (z) = 1 2
, 0 < z ≤ 1; 2−z
, 1 < z ≤ 2.
2
⎧ 1 
3 2
2

⎪ t + , − 23 ≤ t < − 21 ,
⎨ 23 2

− t 2, − 21 ≤ t < 21 ,
Exercise 4.33 f Y (t) = 41   E Y 4 = 8013
.

⎪ t − 3 2 1
, ≤ t < 3
,
⎩2 2 2 2
0, t > 23 or t < − 23 .
Exercise 4.34 f Y ( y) = f X 1 (y1 ) f X 2 (y2 − y1 ) · · · f X n (yn − yn−1 ), where Y = (Y1 ,
Y2 , . . . , Yn ) and y = (y1 , y2 , . . . , yn ).
Exercise 4.35 (1) p X (x) = 2x+5 16
, x = 1, 2. pY (y) = 3+2y 32
, y = 1, 2, 3, 4.
(2) P(X > Y ) = 32 . P(Y = 2X ) = 32
3 9
. P(X + Y = 3) = 16 3
.
P(X ≤ 3 − Y ) = 4 . (3) not independent.
1

0, y ≤ 0; 1 − e−y , 0 < y < 1;
Exercise 4.36 f Y (y) = −y
(e − 1)e , y ≥ 1.
  
Exercise 4.37 (1) MY (t) = exp 7 et − 1 . (2) Poisson distribution P(7).
Exercise 4.38 k = 23 . f Z |X,Y (z|x, y) = x+y+ x+y+z
1 , 0 ≤ x, y, z ≤ 1.
2
Exercise 4.39 E{exp(−Λ)|X = 1} = 49 .
Exercise 4.40 f X,Y,Z (x, y, z) = fU (x) fU2 (y − x) fU3 (z − y).
1
 
Exercise 4.41 (1) f X,Y (x, y) = 2π 1 − x 2 − y 2 u 1 − x 2 − y 2 .
3
 
f X (x) = 43 1 − x 2 u(1 − |x|).
 
(2) f X,Y |Z (x, y|z) = π 1−z
1
2 u 1 − x − y − z
2 2 2
.
( )
not independent of each other.  x+r
2 , −r ≤ x ≤ 0,
Exercise 4.42 (1) c = 2r12 . f X (x) = r r−x
r2
, 0 ≤ x ≤ r.
(2) not independent of each other. (3) f Z (z) = 2z r2
u(z)u(r − z).
474 Answers to Selected Exercises
 √
Exercise 4.44 (1) c = π8 . f X (x) = f Y (x) = 3π 8
1 + 2x 2 1 − x 2 u(x)u(1 − x).
X and Y are not independent of each other.
(2) f R,θ (r, θ) = π8 r 3 , 0 ≤ r < 1, 0 ≤ θ < π2 . (3) p Q (q) = 18 , q = 1, 2, . . . , 8.
Exercise 4.45 probability that the battery with pdf of lifetime f lasts longer than
μ
that with g= λ+μ . When λ = μ, the probability is 21 .
Exercise 4.46 (1) fU (x) = xe−x u(x). f V (x) = 21 e−|x| .
∞ g
f X Y (g) = 0 e−x e− x x1 d xu(g). f YX (w) = (1+w) u(w)
2.

For Z = X +Y = 1+W , f Z (z) = u(z)u(1 − z).


X W
 
f min(X,Y ) (z) = 2e−2z u(z). f max(X,Y ) (z) = 2 1 − e−z e−z u(z).
f min(X,Y ) (x) = (1+x)
2
2 u(x)u(1 − x). (2) f V |U (x|y) = 2y u(y)u(|y| − x).
1
max(X,Y )
Exercise 4.48 (1) E{M} = 1. Var{M}  = 1. n
1
n − 1− p
1 − 1− p
, p = 21 ,
Exercise 4.49 expected value= 2 p−1 2 p−1 p
n2, p = 21 .
1+2 p− p2
Exercise 4.50 E{N } = p2 (2− p)
.
2− p1 p2 + p12 p2 2− p p + p p2
Exercise 4.51 (1) μ1 = 2 p p − p2 p − p p2 + p2 p2 . μ2 = 2 p p − p2 1p 2− p 1p22+ p2 p2 .
1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2
p + p − p2 p + p2 p2 p + p − p p2 + p2 p2
(2) h 1 = 2 p p1 − p22 p −1 p2 p2 +1 p22 p2 . h 2 = 2 p p1 − p22 p −1 p2 p2 +1 p22 p2 .
1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2
p12 −2 p12 p2 + p1 p2 p12 (1− p2 )
Exercise 4.52 α1 = {1−(1− p1 )(1− p2 )}2 . α2 = {1−(1− p1 )(1− p2 )}2 .
1
Exercise 4.53 integral equation: g(x) = 1 + x g(y)dy. g(x) = e1−x .
Exercise 4.54 (1) g(k) = g(k − 1)q + g(k − 2) pq + g(k − 3) p 2 q + δk3 p 3 .
p3 s 3 1+ p+ p2
(2) G X (s) = 1−qs− pqs 2 − p 2 qs 3 . (3) E{X } = .
⎧ p3

⎨ 0, x < x1 ,
FX,Y (x,y)−FX,Y (x1 ,y)
Exercise 4.55 FX,Y |A (x, y) = FX (x2 )−FX (x1 )
, x1 ≤ x < x2 ,

⎩ FX,Y (x2 ,y)−FX,Y (x1 ,y)
FX (x2 )−FX (x1 )
, x ≥ x2 .
f X,Y (x,y)
f X,Y |A (x, y) = (x − x1 ) u (x2 − x).
FX (x2 )−FX (x1 )
u
p X,Y (3, 0) = 121
, p X,Y (4, 0) = 12 3
, p X,Y (5, 0) = 12
2
,
Exercise 4.56
p X,Y (3, 1) = 12 , p X,Y (4, 1) = 12 , p X,Y (5, 1) = 12 .
1 3 2
1
, x = 3, 4, 5; z = x − 1, x,
p Z |X (z|x) = 2
0, otherwise.
⎧ 1

⎪ , x = 3, z = 2, 3,
⎨ 123
, x = 4, z = 3, 4,
p X,Z (x, z) = 12


2
, x = 5, z = 4, 5,
⎩ 12
0, otherwise.
p X |Z (3|2) = 1, p X |Z (3|3) = 14 , p X |Z (3|4) = 0, p X |Z (3|5) = 0,
p X |Z (4|2) = 0, p X |Z (4|3) = 43 , p X |Z (4|4) = 35 , p X |Z (4|5) = 0,
p X |Z (5|2) = 0, p X |Z (5|3) = 0, p X |Z (5|4) = 25 , p X |Z (5|5) = 1.
p Z |Y (2|0) = 0, p Z |Y (3|0) = 16 , p Z |Y (4|0) = 36 , p Z |Y (5|0) = 13 ,
p Z |Y (2|1) = 16 , p Z |Y (3|1) = 36 , p Z |Y (4|1) = 13 , p Z |Y (5|1) = 0.
pY,Z (0, 2) = 0, pY,Z (0, 3) = 12 1
, pY,Z (0, 4) = 12 3
, pY,Z (0, 5) = 122
,
pY,Z (1, 2) = 12 , pY,Z (1, 3) = 12 , pY,Z (1, 4) = 12 , pY,Z (1, 5) = 0.
1 3 2
Answers to Selected Exercises 475

pY |Z (0|2) = 0, pY |Z (0|3) = 14 , pY |Z (0|4) = 35 , pY |Z (0|5) = 1,


pY |Z (1|2) = 1, pY |Z (1|3) = 43 , pY |Z (1|4) = 25 , pY |Z (1|5) = 0.
Exercise 4.57 (1) E{U } = λ1 +λ 1
2
. E{V − U } = λ11 − λ12 + 2λ 1 1
λ2 λ1 +λ2
.
λ1 λ1
E{V } = λ1 + λ2 (λ1 +λ2 ) . (2) E{V } = λ1 + λ2 (λ1 +λ2 ) .
1 1

(3) fU,V −U,I (x, y, i) = λ1 λ2 e−(λ1 +λ2 )x δ(i − 1)e−λ2 y + δ(i − 2)e−λ1 y × u(x)u(y).
(4) independent.
Exercise 4.58 f X (x) = ΓΓ((pp11)Γ+(pp22++pp33)) x p1 −1 (1 − x) p2 + p3 −1 u(x)u(1 − x).
f Y (y) = ΓΓ((pp21)Γ+(pp21++pp33)) y p2 −1 (1 − y) p3 + p1 −1 u(y)u(1 − y).
f X |Y (x|y) = ΓΓ((pp11)Γ+(pp33)) x p1 −1 (1 − x − y) p3 −1 (1 − y)1− p1 − p3 u(x)u(y)u(1 − x − y).
f Y |X (y|x) = ΓΓ((pp22)Γ+(pp33)) y p2 −1 (1 − x − y) p3 −1 (1 − x)1− p2 − p3 u(x)u(y)u(1 − x − y).
f Y ,X (z, x) = Γ Γ( p(1p)Γ 1 + p2 + p3 )
( p2 )Γ ( p3 ) z
p2 −1 (1 − z) p3 −1 x p1 −1 (1 − x) p2 + p3 −1 u(x)u(z)u(1 − x)u(1 − z).
1−X
Γ ( p2 + p3 ) p2 −1
f 1−X
Y
| X (z|x) = Γ ( p2 )Γ ( p3 ) z (1 − z) p3 −1 u(x)u(z)u(1 − x)u(1 − z).
Exercise 4.59 FX,Y |B (x, y)


⎪ 1, x ≥ 1, y ≥ 1,

⎪ x ≤ −1, y ≤ −1,

⎪ 0,

⎪ or y ≤ −x − 1,



⎪ 1
(x + 2
, −1 ≤ x ≤ 0, y ≥ x + 1,

⎪ 1)


2
1
(y + 2
, −1 ≤ y ≤ 0, y ≤ x − 1,

⎪ 2 
1)


⎨ 2 2 − (1 − x)2 , 0 ≤ x ≤ 1, y ≥ 1,
1

= 21 2 − (1 − y)2 , x ≥ 1, 0 ≤ y ≤ 1,




1
2 − (1 − x) 2
− (1 − y) 2
, x ≤ 1, y ≤ 1, y ≥ −x + 1,


2


1
(x
4 
+ y + 1) 2
, x ≤ 0, y ≤ 0, y ≥ −x − 1,




1
4 
(x + y + 1) 2
− 2x 2
, x ≥ 0, y ≤ 0, y ≥ x − 1,




1
4 
(x + y + 1) 2
− 2y 2
, x ≤ 0, y ≥ 0, y ≤ x + 1,




1
(x +
 y + 1) 
2
⎩4
−2 x 2 + y 2 , x ≥ 0, y ≥ 0, y ≤ −x + 1.
f X,Y |B (x, y) = 2 u (1 − |x| − |y|).
1

Exercise 4.60 FX,Y |A (x, y)





1, region 1−3,

⎪ 1 − 2a1 4 ψ 4 (x),


region 1−2,

⎪ 1 − 2a1 4 ψ 4 (y),

⎪ region 1−4,

⎪ − 1
ψ 4
(x) + ψ 4
(y) ,
⎪ 1 region 1−5,

⎪  2   4 
2a 4

⎪ 1
a +x +y −2 x +y
2 2 2 4
, region 1−1,


⎨ 4a  
4

2 2
= 4a 4 ψ (x) + y − 2y ,
1 2 4
region 2−1,


⎪ 4 ψ (x),

1 4
region 2−2,

⎪ 2a
 2  

⎪ 1
ψ (y) + x 2 2
− 2x , 4

⎪ region 4−1,


4a 4

⎪ 1
ψ (y),
4
region 4−2,

⎪ 2a 4  

⎪ 1
ψ 2
(x) − 2 2
,

⎩ 4a 4 y region 3−1,
0, otherwise,

where ψ(t) = a − t (refer to Fig.
2 2
 A.2).
f X,Y |A (x, y) = 2|xa 4y| u ψ 2 (x) − y 2 .
476 Answers to Selected Exercises

Exercise 4.63 f X (x) = (n−i)!(i−1)! n!


F i−1 (x){1 − F(x)}n−i f (x).
f Y (y) = (n−k)!(k−1)!
n!
F k−1 (y){1 − F(y)}n−k f (y).
Exercise 4.64 (1) p X 1 ,X 2 ,X 3 ,X 4 (x1 , x2 , x3 , x4 )
⎧ N 

⎪ { p1 (1 − p2 )}x1 {(1 − p1 ) p2 }x2
⎨ x1 ,x2 ,x3 ,x4 4
= × ( p1 p2 )x3 {(1 − p1 ) (1 − p2 )}x4 , if xi = N ,


⎩ i=1
0, otherwise.
(X 2 +X 3 )(X 1 +X 3 )
(3) p̂1 = X 2X+X 3
3
. p̂ 2 = X3
X 1 +X 3
. λ̂ = X3
. (4) X̂ 4 = XX1 X3 2 .
Exercise 4.65 (1) ρ X |X | = 0. (2) ρ X |X | = 1. (3) ρ X |X | = −1.
Exercise 4.66 f X,2X +1 (x, y) = 21 {u(x) − u(x − 1)} δ y−1 2
−x .
Exercise 4.67 For x ∈ {x| f X (x) > 0},
f Y |X (y|x) = {δ(x + y) + δ(x − y)}u(y).
  √   √   √ 
Exercise 4.69 (1) FX,Y (x, y) = F(x) − F − y u x + y − F(x) − F y
 √    √   √   √ 
u x − y u(y) = F min x, y − F − y u(y)u x + y .
  √   √ 
(2) f X,Y (x, y) = 2f √(x)y δ x + y + δ x − y u(y)
f (x) √ 
= 2|x| δ y − |x| u(y).
  √   √ 
(3) f X |Y (x|y) = f √ y f+(x) √ δ x + y + δ x − y u(y).
( ) f (− y )
Exercise 4.71 FX,Y (x, y) = {FX (x)u(y − x) + FX (y)u(x − y)} u(y).
f X,Y (x, y) = f X (x) {u(y)δ(y − x) + u(y − x)δ(y)}.
Exercise 4.73 f X 1 (t) = 6t (1 − t)u(t)u(1 − t). f X 2 (t) = 23 (1 − |t|)2 u(1 − |t|)
Exercise 4.74 (1) f Y (y1 , y2 ) = 1
2|y1 |
u y2
y1
+ y1 u 1 − 1
2
y2
y1
+ y1 u y2
y1
− y1
u 1− 1
2
y2
− y1
y1
(2) f Y1 (y) = (1 − |y|)u(1 − √
|y|). √
f Y2 (y) = 21 u(1 − |y|) ln 1−√|y|
1−|y|
= 14 u(1 − |y|) ln 1+ 1−|y|

1− 1−|y|
.

Chapter 5 Normal Random Vectors

Exercise 5.1 (3) The vector (X, Y ) is not a bi-variate normal random vector.
(4) The random variables X and Y are not independent of each other.
exp{− 21 (x 2 +y 2 )}  2 
Exercise 5.2 f 2 (x, y) =
X 1 ,X 2 |X 1 +X 2 <a 2
2  u a − x 2 − y2 .
2
2π 1−exp − a2
    
u(t) u v + π2 − u v − π2 .
2
Exercise 5.3 (1) fU,V (t, v) = t
π
exp − t2
exp − 2t 2+v u(v).
2
(2) fU,V (t, v) = √1
π 2v  
2
Exercise 5.4 f Y |X (y|x) = √13π exp − 13 y − 4 − x−3

2
.
  
2
f X |Y (x|y) = 3π
2
exp − 23 x − 3 − 2y−4

2
.
Answers to Selected Exercises 477

Exercise 5.5 ρ Z W = 
(σ22 −σ12 ) cos θ sin θ .
( ) cos2 θ sin2 θ+σ12 σ22 (cos2 θ−sin2 θ)2
σ12 +σ22
2

0, x < −|y|; 21 , −|y| ≤ x < |y|;


Exercise 5.8 (1) FX |Y (x|y) =
1, x > |y|.
x v 2
(2) FX (x) = √2π −∞ exp − 2 dv. The random variable X is normal.
1

(3) The vector (X, Y ) is not a normal random vector.


(4) f X |Y (x|y) = 21 δ(x + y) + 21 δ(x − y).
2
f X,Y (x, y) = 21 {δ(x + y) + δ(x − y)} √12π exp − y2 .
⎛ 1 ⎞
√ 0 − √1
⎜ 2 2

Exercise 5.9 ⎝ √334 √434 √334 ⎠
√2
17
− √17 √17
3 2

Exercise 5.10 f C (r, θ) = r f X (r cos θ, r sin θ) u (r ) u(π − |θ|). The random vector
C is an independent
 random
 vector when X is an i.i.d. random vector with marginal
distribution N 0, σ 2 .
Exercise 5.12 The conditional distribution of X 3 when X 1 = X 2 = 1 is N (1, 2).
Exercise 5.13 acσ 2X + (bc + ad)ρσX σY + bdσY2 = 0.
 2 2 5.14 E{X Y 2}= 2ρσ2X σY . E X Y = 0. E X Y = 3ρσ X σY .
2 3 3
Exercise
E X Y = 1 + 2ρ σ X σY .
Exercise
 5.16 
E Z 2 W 2 = 1 + 2ρ2 σ12 σ22 + m 22 σ12 + m  1 σ2 + 2m 1 m4 2 2+ 4m 1 m 2 ρσ1 σ2 .
2 2 2 2

Exercise 5.17 μ51 = 15ρσ1 σ2 . μ42 = 3ρ 1 + 4ρ σ1 σ2 .


5

μ33 = 3ρ 3 + 2ρ2 σ13 σ23 .



Exercise 5.19 E{|X Y |} = 2 + 62 π.
 
Exercise 5.20 E {|X 1 |} = π2 . E {|X 1 X 2 |} = π2 1 − ρ2 + ρ sin−1 ρ .
! !    
E ! X 1 X 23 ! = π2 3ρ sin−1 ρ + 2 + ρ2 1 − ρ2 .
1 ×2!4! = 4320.
Exercise 5.25 23!4!4!5!
Exercise 5.30 f X (x) = π x 2r+r 2 .
( )
Exercise 5.32 E{Y } = n + δ. σY2 = 2n + 4δ.
Γ n−1 
Exercise 5.34 E{Z } = δ Γ( n2 ) n2 for n > 1.
(2)
Var{Z } = (n−2 ) − nδ2 Γ (2 n2 ) for n > 2.
n 1+δ 2 2 Γ 2 n−1

(2)
Exercise 5.36 E{H } = m(n−2) n(m+δ)
for n > 2.
2n 2 {(m+δ)2 +(n−2)(m+2δ)}
Var{H } = m 2 (n−4)(n−2)2
for n > 4.
  μ4 3(n−1)μ22
Exercise 5.40 μ4 X n = n 3 + n 3 .
 
Exercise 5.42 (1) pdf: f V (v) = e−v u(v). cdf: FV (v) = 1 − e−v u(v).
ρ
2β 2 ρ sin−1
Exercise 5.43 RY = π
sin−11+α 2 . ρ Y = sin−1
1+α2
1 .
1+α2

ρY |α2 =1 = 6
π
sin−1 ρ2 . lim ρY = π sin ρ.
2 −1
α2 →0
478 Answers to Selected Exercises

Chapter 6 Convergence of Random Variables

Exercise 6.1 {X n (ω)} converges almost surely, but not surely, to X (ω).
Exercise 6.2 {X n (ω)} converges almost surely,but not surely,
 to X (ω).
n
Exercise 6.3 (1) Sn ∼ NB(n, α). (2) Sn ∼ NB ri , α .
n  n  i=1 n 
   
n
(3) Sn ∼ P λi . (4) Sn ∼ G αi , β . (5) Sn ∼ C μi , θi .
i=1 i=1 i=1 i=1
p
Exercise 6.4 Snn → m.
Exercise 6.7 The weak law of large numbers holds true.
Exercise 6.10 P (50 < Sn ≤ 80) = P (50.5 < Sn < 80.5) ≈ 0.9348.
Exercise 6.11 E{U } = p(1 − p) + np 2 . E{V } = np 2 .
Exercise 6.12 mgf of Y MY (t) = 2−e1
2t . expected value= 2. variance= 8.

Exercise 6.13 (1) P3 = P5 = 2 . P4 = 16


1 5
. (2) lim Pn = 21 .
n→∞
Exercise 6.14 lim Fn (x) = u(x). It is a cdf.
n→∞   
Exercise 6.15 It is convergent. FYn (y) → 1 − exp − θy u(y).
Exercise 6.16 lim FYn (y) = u(y).
n→∞
Exercise 6.17 It is convergent. Fn (x) → u(x).
"n   
Exercise 6.18 (1) ϕ X̂ n (w) = ϕ X (w) ϕWi wn . E X̂ n = E{X }.
i=1

1  2
n
Var X̂ n = Var{X } + n 2 σi .
  i=1
(2) Cov Yi , Y j = Var{X } + σi2 for i = j and Var{X } for i = j.
n
α2
(3) Denoting σ 2 = n12 σi2 , f εn (α) = √ 1 2 exp − 2σ 2 .
2πσ
i=1 
f X̂ n |X (α|β) = √ 1 2 exp − (α−β)
2

2σ 2
.
2πσ
(4) X̂ n is mean square convergent to X .
Exercise 6.26 an = nσ 2 . bn = nμ4 .
Exercise 6.27 α ≤ 0.
Exercise 6.29 Exact value: P(S ≥ 3) = 1 − 5e−2 ≈ 0.3233.
approximate value: P(S ≥ 3) = P S−2 √ ≥ √1
2 2
= P Z ≥ √12
≈ 0.2398. or P(S ≥ 3) = P(S > 2.5) = P Z > 2√1 2 ≈ 0.3618 .
Exercise 6.30 lim Fn (x) = F(x), where F is the cdf of U [0, 1).

n→∞

d
dx
lim F n (x) = lim ddx Fn (x) .
n→∞ n→∞
Exercise 6.32 Distribution of points: Poisson with parameter μ p.
mean: μ p = 1.5. variance: μ p = 1.5.
Exercise 6.33 Distribution of the total information sent via the fax:
geometric distribution with expected value αβ 1
= 4 × 105 .
Exercise 6.34 Expected value: E {S N } = pλ 1
. variance: Var {S N } = 1
p 2 λ2
.
Answers to Selected Exercises 479

Exercise 6.35 Expected value of T : λ3 .


Exercise 6.36 (1) lim Fn (x) = 21 . This limit is not a cdf.
n→∞
(2) Fn (x) → 1. This limit is not a cdf.
(3) Because G 2n (x) → 1 and G 2n+1 (x) → 0, {G n (x)}∞
n=1 is not convergent.
Exercise 6.37 FYn (y) → u(y − β).
Index

A Almost surely, 17, 110


Absolute central moment, 406 Always convergence, 415
normal variable, 406 Appell’s symbol, 48
Absolute continuity, 29 Ascending factorial, 48
Absolutely continuous function, 32, 169 Associative law, 10
convolution, 32 Asymmetry, 205
Absolute mean, 250 At almost all point, 17
logistic distribution, 251 At almost every point, 17
Absolute mean inequality, 449 Auxiliary variable, 273, 275
Absolute moment, 193, 364 Axiom of choice, 150
Gaussian distribution, 196
tri-variate normal distribution, 366
Absolute value, 177 B
cdf, 177 Basis, 198
pdf, 184 orthonormal basis, 198
Abstract space, 1 Bayes’ theorem, 121, 154, 212, 303
Addition theorem, 118 Bernoulli distribution, 126
Additive class of sets, 96 sum, 428
Additive function, 143 Bernoulli random variable, 428
Additivity, 106 Bernoulli trial, 126
countable additivity, 106 Bertrand’s paradox, 114
finite additivity, 106 Bessel function, 249
Algebra, 94 Beta distribution, 135
generated from C , 95 expected value, 245
σ-algebra, 96 mean, 245
Algebraic behavior, 398 moment, 245
Algebraic number, 85 variance, 245
Almost always, 17, 110 Beta function, 53, 135
Almost always convergence, 416 Bienayme-Chebyshev inequality, 451
Almost certain convergence, 416 Big O, 219
Almost everywhere, 17, 110 Bijection, 22
Almost everywhere convergence, 416 Bijective function, 22
Borel-Cantelli lemma, 417 Bijective mapping, 22
Almost sure convergence, 415 Binary distribution, 126
sufficient condition, 417 Binomial coefficient, 46, 65, 127

© The Editor(s) (if applicable) and The Author(s), under exclusive license 481
to Springer Nature Switzerland AG 2022
I. Song et al., Probability and Random Variables: Theory and Applications,
https://doi.org/10.1007/978-3-030-97679-8
482 Index

complex space, 63 Ceiling function, 74


Binomial distribution, 127 Central chi-square distribution, 377
cdf, 219 expected value, 378
cf, 207 mean, 378
expected value, 195, 207 mgf, 377
Gaussian approximation, 220 moment, 378
kurtosis, 252 sum, 378
mean, 195, 207 variance, 378
skewness, 252 Central constant, 432
variance, 195, 207 Central F distribution, 384
Binomial expansion, 67 expected value, 384
Binomial random variable, 218 mean, 384
sum, 429 moment, 384
Bi-variate Cauchy distribution, 403 variance, 384
Bi-variate Gaussian distribution, 403 Central limit theorem, 217, 396, 438, 444
Bi-variate isotropic symmetric α- generalized, 396
stabledistribution, 402 Lindeberg central limit theorem, 439,
Bi-variate normal distribution, 403 444
Bi-variate random vector, 260 Central moment, 193
Bi-variate t distribution, 383 Central t distribution, 380
Bonferroni inequality, 110 expected value, 410
Boole inequality, 108, 141 mean, 410
Borel-Cantelli lemma, 141, 425 moment, 410
almost everywhere convergence, 417 variance, 410
Borel field, 104, 146 Certain convergence, 415
Borel set, 104, 146 Characteristic equation, 295
Borel σ algebra, 104, 146 Characteristic exponent, 400
Borel σ field, 104 Characteristic function (cf), 199
Bound, 55 binomial distribution, 207
Bounded convergence, 447 geometric distribution, 200
Box, 144 joint cf, 295
Breit-Wigner distribution, 133 marginal cf, 296
Buffon’s needle, 267 moment theorem, 206
negative binomial distribution, 200
negative exponential distribution, 246
C normal distribution, 200
Cantor function, 30 Poisson distribution, 208
Cantor set, 16 random vector, 295
Cantor ternary set, 16 Chebyshev inequality, 449
Cardinality, 17 Chernoff bound, 453
Cartesian product, 14 Chu-Vandermonde convolution, 69
Cauchy criterion, 419 C ∞ function, 41
Cauchy distribution, 133 Class, 4
bi-variate Cauchy distribution, 403 completely additive class, 96
cdf, 170 singleton class, 4
expected value, 196 Classical definition, 112
generalized Cauchy distribution, 398 Class of sets, 4
mean, 196 Closed interval, 4
sum, 455 Closed set, 146
variance, 196 Closure, 24
Cauchy equation, 225 Codomain, 19
Cauchy-Schwarz inequality, 290, 451 Coefficient of variation, 205
Cauchy’s integral theorem, 73 exponential distribution, 205
Index 483

Poisson distribution, 252 almost certain convergence, 416


Collection, 4 almost everywhere convergence, 416
partition, 7 almost sure convergence, 415
singleton collection, 4 always convergence, 415
Collection of sets, 4 bounded convergence, 447
Combination, 46 certain convergence, 415
Combinatorics, 114 continuity of probability, 140
Commutative law, 9 convergence of multiplication, 448
Complement, 5 dominated convergence, 447
Completely additive class, 96 everywhere convergence, 415
Complex conjugate transpose, 290 in distribution, 421
Complex random vector, 290 in law, 421
Component, 1 in probability, 420
Concave, 450 in the mean square, 419
Conditional cdf, 209 in the r -th mean, 418
Conditional distribution, 208, 300 mean square convergence, 419
normal random vector, 349 monotonic convergence, 447
Conditional expected value, 214, 307 probability function, 443
expected value of conditional expected properties of convergence, 447
value, 307 relationships among convergence, 422
Conditional joint cdf, 300 set, 62
Conditional joint pdf, 300 stochastic convergence, 420
Conditional pdf, 209 sure convergence, 415
Conditional pmf, 208 weak convergence, 421
Conditional probability, 116 with probability 1, 416
Conditional rate of failure, 213, 226 Convex, 450
Conditional variance, 308 Convolution, 32, 203, 276, 296
Conjugate transpose, 290 absolutely continuous function, 32
Conservation law, 271 Chu-Vandermonde convolution, 69
Continuity, 26, 139 singular function, 32
absolute continuity, 29 Vandermonde convolution, 69
expectation, 447 Coordinate transformation, 405
expected value, 447 Correlation, 289
probability, 140 Correlation coefficient, 289
uniform continuity, 26 Correlation matrix, 290
Continuity correction, 221, 443 Countable set, 12
Continuity of expectation, 447 Counting, 45
bounded convergence, 447 Covariance, 289
dominated convergence, 447 Covariance matrix, 290, 291
monotonic convergence, 447 Covering, 144
Continuity of probability, 140 Cumulant, 204
continuity from above, 139 Cumulative distribution function (cdf), 165,
continuity from below, 139 227
limit event, 140 absolute value, 177
Continuous function, 25 binomial distribution, 219
Continuous part, 230 Cauchy distribution, 170
Continuous random variable, 162, 169 complementary standard normal cdf, 218
Continuous random vector, 255 conditional cdf, 209
Continuous sample space, 100 conditional joint cdf, 300
Continuous space, 100 discontinuous function, 242
Contour, 340 double exponential distribution, 170
Convergence, 30, 62 inverse, 176
almost always convergence, 416 inverse cdf, 232
484 Index

joint cdf, 256 Bernoulli distribution, 126


limit, 458 beta distribution, 135
limiter, 177 binary distribution, 126
linear function, 175 binomial distribution, 127
logistic distribution, 170 bi-variate Cauchy distribution, 403
magnitude, 177 bi-variate Gaussian distribution, 403
marginal cdf, 256 bi-variate isotropic SαS distribution, 402
Poisson distribution, 222 bi-variate normal distribution, 403
Rayleigh distribution, 170 bi-variate t distribution, 383
sign, 186 Breit-Wigner distribution, 133
square, 176 Cauchy distribution, 133, 398
square root, 177 central chi-square distribution, 377
standard normal distribution, 217 central F distribution, 384
Cyclic sum, 367, 393 central t distribution, 380
conditional distribution, 208, 300
de Moivre-Laplace distribution, 133
D difference of two random variables, 278
Decorrelating normal random vector, 355 double exponential distribution, 132, 397
Decreasing sequence, 57 exponential distribution, 132, 226
Degenerate bi-variate normal pdf, 342 gamma distribution, 134
Degenerate multi-variate normal pdf, 347 Gauss distribution, 133
Degenerate tri-variate normal pdf, 347 Gaussian distribution, 133
Degree of freedom, 184, 377 Gauss-Laplace distribution, 133
Delta-convergent sequence, 39 geometric distribution, 127, 226, 252
Delta function, 33 heavy-tailed distribution, 396
de Moivre-Laplace theorem, 220 hypergeometric distribution, 173, 327
de Morgan’s law, 10 impulsive distribution, 396
Density function, 130 inverse of central F random variable, 385
Denumerable set, 12 Laplace distribution, 132, 397
Diagonal matrix, 293 lattice distribution, 231
Difference, 8 limit distribution, 378
geometric sequence, 80 limit of central F distribution, 385, 386
symmetric difference, 8 logistic distribution, 134, 251
two random variables, 278 log-normal distribution, 183
Discontinuity, 27 long-tailed distribution, 396
jump discontinuity, 27, 31, 168 Lorentz distribution, 133
type 1 discontinuity, 27 multinomial distribution, 259, 316
type 2 discontinuity, 27 negative binomial distribution, 129
Discontinuous part, 230 negative exponential distribution, 246
Discrete combined space, 100 non-central chi-square distribution, 379
Discrete part, 230 non-central F distribution, 388
Discrete random variable, 162 non-central t distribution, 382
Discrete random vector, 255 normal distribution, 133, 217
joint pmf, 262 Pascal distribution, 129
marginal pmf, 262 Poisson distribution, 128, 222
Discrete sample space, 100 Polya distribution, 129
Discrete space, 100 product of two random variables, 278
unit step function, 36 ratio of two normal random variables,
Disjoint, 7 342
Dispersion parameter, 400 ratio of two random variables, 279
Distance, 23 Rayleigh distribution, 133, 252
Distance function, 23 rectangular distribution, 131
Distribution, 36, 106, 164 second Laplace distribution, 133
Index 485

stable distribution, 400, 438 central t distribution, 410


standard normal distribution, 217 conditional expected value, 214, 307
sum of two random variables, 276, 282 continuity, 447
two-point distribution, 126 double exponential distribution, 245
uniform distribution, 127, 131 expected value of conditional expected
Distribution function, 165 value, 307
Distributive law, 10 exponential distribution, 194
Domain, 19 gamma distribution, 251
Dominated convergence, 447 Gaussian distribution, 195
Double exponential distribution, 132 geometric distribution, 250
cdf, 170 hypergeometric distribution, 251
expected value, 245 magnitude, 250
mean, 245 negative binomial distribution, 250
variance, 245 non-central chi-square distribution, 379
Dumbbell-shaped region, 68 non-central F distribution, 388
non-central t distribution, 383
Poisson distribution, 195
E sample mean, 371
Element, 1 sign, 250
Elementary event, 101 uniform distribution, 194
Elementary outcome, 100 Exponential behavior, 398
Elementary set, 144 Exponential distribution, 132, 224, 226
Ellipse, 412 coefficient of variation, 205
Empty set, 3 expected value, 194
Enclosure, 24 failure rate function, 226
Ensemble average, 190 hazard rate function, 226
Enumerable set, 12 kurtosis, 206
Equality, 2 Markovian property, 224
in distribution, 226, 422 mgf, 207
Equivalence, 17 random number generation, 245
Equivalence theorem, 18 rate, 132
Error function, 217 skewness, 205
Euclidean space, 144 standard exponential pdf, 132
Eulerian integral of the first kind, 53 sum, 429
Eulerian integral of the second kind, 54 variance, 194
Euler reflection formula, 50, 72
Euler’s integral formula, 69
Even function, 41 F
Event, 101 Factorial, 46
elementary event, 101 ascending factorial, 48
independent, 123 falling factorial, 46
Event space, 101 rising factorial, 48
largest event space, 102 upper factorial, 48
smallest event space, 102 Factorization property, 309
Everywhere convergence, 415 Failure rate function, 226
Expectation, 190 Falling factorial, 46
continuity, 447 Family, 4
Expected value, 190, 287 singleton family, 4
beta distribution, 245 Family of sets, 4
binomial distribution, 195, 207 Fat Cantor set, 17
Cauchy distribution, 196 Fibonacci number, 76
central chi-square distribution, 378 Field, 94
central F distribution, 384 Borel field, 104
486 Index

generated from C , 95 one-to-one mapping, 21, 22


sigma field, 96 onto function, 21
Finite additivity, 106 pdf, 130
Finitely μ-measurable set, 145 piecewise continuous function, 35
Finitely often (f.o.), 61 pmf, 125
Finite random variable, 162 Pochhammer function, 48
Finite set, 3 probability density function, 130
Floor function, 76, 128 probability mass function, 125
Fourier series, 208 set function, 19, 93
Fourier transform, 199 simple function, 147
step function, 87 singular function, 29, 169
Fubini’s theorem, 82 step function, 33
Function, 19 step-like function, 169, 229
absolutely continuous function, 169 surjection, 21
additive function, 143 surjective function, 21
Bessel function, 249 test function, 41
bijective function, 22 Thomae’s function, 28
Cantor function, 30 unit step function, 33
cdf, 165 Function of impulse function, 43
ceiling function, 74 Fuzzy set, 2
cf, 199
characteristic function, 199
C ∞ function, 41 G
continuous function, 25 Gambler’s ruin problem, 157
cumulative distribution function, 165 Gamma distribution, 134
delta function, 33 expected value, 251
distance function, 23 mean, 251
distribution function, 165 mgf, 251
error function, 217 sum, 455
floor function, 76, 128 variance, 251
function of impulse function, 43 Gamma function, 47
gamma function, 47 Gauss function, 76
Gauss function, 76 Gaussian approximation, 220
generalized function, 36 binomial distribution, 220
Gödel pairing function, 86 multinomial distribution, 317
Heaviside function, 33 Gaussian distribution, 133
hypergeometric function, 69, 364 absolute moment, 196
impulse function, 33, 36 bi-variate Gaussian distribution, 403
increasing singular function, 31 cf, 200
injection, 21 expected value, 195
injective function, 21 generalized Gaussian distribution, 397
into function, 21 moment, 196
Kronecker delta function, 293 standard Gaussian distribution, 133
Kronecker function, 293 standard Gaussian pdf, 133
Lebesgue function, 30 sum, 429
max, 35 variance, 195
measurable function, 147, 162 Gaussian noise, 396
membership function, 2 Gaussian random vector, 337, 338
mgf, 202 bi-variate Gaussian random vector, 339
min, 35 multi-dimensional Gaussian random
moment generating function, 202 vector, 338
one-to-one correspondence, 22 General formula, 360
one-to-one function, 21 joint moment, 362, 369
Index 487

moment, 360 I
Generalized Bienayme-Chebyshev inequal- Image, 19
ity, 451 inverse image, 20
Generalized Cauchy distribution, 398 pre-image, 20
moment, 410 Impulse-convergent sequence, 39
Generalized central limit theorem, 396 Impulse function, 33, 36, 137
Generalized function, 36 symbolic derivative, 41
Generalized Gaussian distribution, 397 Impulse sequence, 39
moment, 409 Impulsive distribution, 396
Generalized normal distribution, 397 Incomplete mean, 250, 334, 364
moment, 409 Incomplete moment, 250, 364
Geometric distribution, 127, 226, 252 Increasing singular function, 31
cf, 200 Independence, 123, 350
expected value, 250 mutual, 124
mean, 250 pairwise, 124
skewness, 205 Independent and identically distributed
sum, 455 (i.i.d.), 269
Independent events, 123
variance, 250
a number of independent events, 124
Geometric sequence, 80
Independent random vector, 266
difference, 80
several independent random vectors, 270
Gödel pairing function, 86
two independent random vectors, 269
Greatest lower bound, 55
Index set, 100
Indicator function, 147
In distribution, 226, 422
H Inequality, 108, 448
Hagen-Rothe identity, 69 absolute mean inequality, 449
Half-closed interval, 4 Bienayme-Chebyshev inequality, 451
Half mean, 250, 334, 364 Bonferroni inequality, 110
logistic distribution, 251 Boole inequality, 108, 141
Half moment, 250, 364 Cauchy-Schwarz inequality, 290, 451
Chebyshev inequality, 449
Half-open interval, 4
Chernoff bound, 453
Half-wave rectifier, 242, 335
generalized Bienayme-Chebyshev
Hazard rate function, 226
inequality, 451
Heaviside convergence sequence, 33
Hölder inequality, 454
Heaviside function, 33
Jensen inequality, 450
Heaviside sequence, 33 Kolmogorov inequality, 452
Heavy-tailed distribution, 396 Lipschitz inequality, 26
Heine-Cantor theorem, 27 Lyapunov inequality, 423, 450
Heredity, 99 Markov inequality, 425, 449
Hermitian adjoint, 290 Minkowski inequality, 454
Hermitian conjugate, 290 tail probability inequality, 448
Hermitian matrix, 292 triangle inequality, 454
Hermitian transpose, 290 Infimum, 144
Hölder inequality, 454 Infinite dimensional vector space, 100
Hybrid random vector, 256 Infinitely often (i.o.), 61
Hypergeometric distribution, 173, 327 Infinite set, 3
expected value, 251 Inheritance, 99
mean, 251 Injection, 21
moment, 251 Injective function, 21
variance, 251 In probability, 109
Hypergeometric function, 69, 364 Integral, 79
488 Index

Lebesgue integral, 79, 131, 148, 168 L


Lebesgue-Stieltjes integral, 79, 168 Laplace distribution, 132
Riemann integral, 131, 148, 168 Laplace transform, 202
Riemann-Stieltjes integral, 79, 168 Largest event space, 102
Intersection, 6 Lattice distribution, 231
Interval, 4, 144 Laws of large numbers, 432
closed interval, 4 Least upper bound, 55
half-closed interval, 4 Lebesgue decomposition theorem, 230
half-open interval, 4 Lebesgue function, 30
open interval, 4 Lebesgue integral, 79, 131, 148, 168
Interval set, 4 Lebesgue length, 454
Into function, 21 Lebesgue measure, 131, 146
Inverse cdf, 232 Lebesgue-Stieltjes integral, 79, 168
Inverse Fourier transform, 87 Leibnitz’s rule, 179
Inverse image, 20 Leptokurtic, 205
Inverse of central F random variable, 385 L’Hospital’s theorem, 408
Inverse operation, 9 Limit, 56, 57, 62
Isohypse, 340 cdf, 458
central F distribution, 385, 386
limit inferior, 56
limit set, 62
J
limit superior, 56
Jacobian, 272
lower limit, 59
Jensen inequality, 450
negative binomial pmf, 157
Joint cdf, 256
pdf, 458
conditional joint cdf, 300
random variable, 415
Joint central moment, 289
upper limit, 60
Joint cf, 295
Limit distribution, 378
Joint mgf, 296
central F distribution, 385, 386
Joint moment, 289, 298
inverse of F random variable, 387
general formula, 362, 369
Limiter, 177
normal random vector, 356, 406
cdf, 177
Joint pdf, 257
Limit event, 138
conditional joint pdf, 300
continuity of probability, 140
Joint pmf, 259
probability, 138, 139
discrete random vector, 262
Limit in the mean (l.i.m.), 419
Joint random variables, 255
Limit inferior, 56
Jump discontinuity, 27, 31, 168 Limit of central F distribution, 385, 386
Limit point, 24
Limit set, 57, 62
K convergence, 62
Khintchine’s theorem, 434 monotonic sequence, 57
Kolmogorov condition, 435 Limit superior, 56
Kolmogorov inequality, 452 Lindeberg central limit theorem, 439, 444
Kolmogorov’s strong law of large numbers, Lindeberg condition, 440, 445
436 Linearly dependent random vector, 293
Kronecker delta function, 293 Linearly independent random vector, 293
Kronecker function, 293 Linear transformation, 274
Kronecker lemma, 435 normal random vector, 351, 355
Kurtosis, 205 random vector, 293
binomial distribution, 252 Line mass, 264
exponential distribution, 206 Lipschitz constant, 26
Poisson distribution, 252 Lipschitz inequality, 26
Index 489

Location parameter, 400 non-central chi-square distribution, 379


Logistic distribution, 134 non-central F distribution, 388
absolute mean, 251 non-central t distribution, 383
cdf, 170 normal distribution, 195
half mean, 251 Poisson distribution, 195
moment, 251 sample mean, 371
Log-normal distribution, 183 sign, 250
Long-tailed distribution, 396 uniform distribution, 194
Lorentz distribution, 133 Mean square convergence, 419
Lower bound, 55, 144 Measurable function, 147, 162
greatest lower bound, 55 Measurable set, 101
Lower bound set, 59 finitely μ-measurable set, 145
Lower limit, 59 μ-measurable set, 145
Lyapunov inequality, 423, 450 Measurable space, 105, 147
Measure, 93, 143
Lebesgue measure, 131, 146
M outer measure, 144
Magnitude, 177 Measure space, 147
cdf, 177 Measure theory, 93
expected value, 250 Measure zero, 17, 110
pdf, 184 Median, 189
variance, 250 uniform distribution, 190
Mapping, 19 Membership function, 2
bijective function, 22 Memoryless, 224
one-to-one correspondence, 22 Metric, 23
one-to-one mapping, 22 Metric space, 24
Marginal cdf, 256 Mighty, 159
Marginal cf, 296 Mild peak, 205
Marginal mgf, 296 Min, 35
Marginal pdf, 256 symbolic derivative, 41
Marginal pmf, 256 Minkowski inequality, 454
Markov condition, 433 Mixed probability measure, 137
Markovian property, 224 Mixed random vector, 256
Markov inequality, 425, 449 Mixed-type random variable, 162, 169
Markov theorem, 433 Mode, 189
Mass function, 125 uniform distribution, 190
Max, 35 Moment, 193
symbolic derivative, 41 absolute moment, 193
Mean, 190 beta distribution, 245
beta distribution, 245 central chi-square distribution, 378
binomial distribution, 195, 207 central F distribution, 384
Cauchy distribution, 196 central moment, 193
central chi-square distribution, 378 central t distribution, 410
central F distribution, 384 cumulant, 204
central t distribution, 410 Gaussian distribution, 196
double exponential distribution, 245 general formula, 360
gamma distribution, 251 generalized Cauchy distribution, 410
geometric distribution, 250 generalized Gaussian distribution, 409
half mean, 250, 334, 364 generalized normal distribution, 409
hypergeometric distribution, 251 half moment, 250, 364
incomplete mean, 250, 334, 364 hypergeometric distribution, 251
magnitude, 250 incomplete moment, 250, 364
negative binomial distribution, 250 joint central moment, 289
490 Index

joint moment, 289 mgf, 379


logistic distribution, 251 sum, 379
moment theorem, 206, 298 variance, 379
normal distribution, 196, 249 Non-central F distribution, 388
partial moment, 250, 364 expected value, 388
Rayleigh distribution, 249 mean, 388
Moment generating function (mgf), 202 variance, 388
central chi-square distribution, 377 Non-central t distribution, 382
exponential distribution, 207 expected value, 383
gamma distribution, 251 mean, 383
joint mgf, 296 mgf, 383
marginal mgf, 296 variance, 383
non-central chi-square distribution, 379 Non-decreasing sequence, 57
non-central t distribution, 383 Non-denumerable set, 15
random vector, 296 Non-increasing sequence, 57
sample mean, 410 Non-measurable set, 101, 149
Moment theorem, 206, 298 Normal distribution, 133, 217
Monotonic convergence, 447 absolute central moment, 406
Monotonic sequence, 57 bi-variate normal distribution, 403
limit set, 57 bi-variate normal pdf, 383
Monotonic set sequence, 57 complementary standard normal cdf, 218
Multi-dimensional Gaussian random vector, degenerate bi-variate pdf, 342
338 degenerate multi-variate pdf, 347
Multi-dimensional normal random vector, degenerate tri-variate pdf, 347
338 generalized normal distribution, 397
Multi-dimensional random vector, 255 mean, 195
Multinomial coefficient, 47, 259 moment, 196, 249
Multinomial distribution, 259, 316 multi-variate normal pdf, 338
Gaussian approximation, 317 standard normal distribution, 133, 157,
Poisson approximation, 316 217
Multiplication theorem, 118 sum, 429
Multi-variable random vector, 255 variance, 195
Multi-variate random vector, 255 Normalizing constant, 432
Mutually exclusive, 7 Normal matrix, 293
Normal noise, 396
Normal random vector, 337, 338
N bi-variate normal random vector, 339
Negative binomial distribution, 129, 266 conditional distribution, 349
cf, 200 decorrelation, 355
expected value, 250 joint moment, 356, 406
limit, 157 linear combination, 356
mean, 250 linear transformation, 351, 355
skewness, 205 multi-dimensional normal random vec-
sum, 455 tor, 338
variance, 250 Null set, 3
Negative exponential distribution, 246 Number of partition, 83
Neighborhood, 24
Noise, 396
Gaussian noise, 396 O
normal noise, 396 One-to-one correspondence, 12, 22
Non-central chi-square distribution, 379 One-to-one mapping, 22
expected value, 379 One-to-one transformation, 274
mean, 379 pdf, 179, 274
Index 491

Onto function, 21 A posteriori probability, 156


Open interval, 4 Posterior probability, 156
Open set, 146 Power set, 4
Order statistic, 135, 285, 315, 395 equivalence, 17
Orthogonal, 289 equivalent, 17
Orthogonal matrix, 293 Pre-image, 20
Orthonormal basis, 198 Price’s theorem, 358
Outer measure, 144 A priori probability, 156
Prior probability, 156
Probability, 106
P a posteriori probability, 156
Pairwise independence, 124 a priori probability, 156
Parallel connection, 124 axioms, 106
Partial correlation coefficient, 344 classical definition, 112
Partial moment, 250, 364 conditional probability, 116
Partition, 7, 82 continuity, 140
Pascal distribution, 129 limit event, 138, 139
Pascal’s identity, 89 posterior probability, 156
Pascal’s rule, 89 prior probability, 156
Pattern, 310, 318 relative frequency, 115
mean time, 318 Probability density, 93
Peakedness, 205 Probability density function (pdf), 130, 169
Percentile, 189 absolute value, 184
Permutation, 46 beta pdf, 135
with repetition, 46 binomial pdf, 219
Piecewise continuous function, 35 bi-variate Cauchy pdf, 383
Platykurtic, 205 bi-variate Gaussian pdf, 339
Pochhammer function, 48 bi-variate normal pdf, 339
Pochhammer polynomial, 48 Breit-Wigner pdf, 133
Pochhammer’s symbol, 48 Cauchy pdf, 133
Point, 1 central chi-square pdf, 184, 377
Point mass, 263 central F pdf, 384
Point set, 2 central t pdf, 380
Poisson approximation, 221 chi-square pdf, 377
multinomial distribution, 316 conditional pdf, 209
Poisson distribution, 128, 222, 297, 453 cosine, 244
cdf, 222 double exponential pdf, 132, 397
cf, 208 exponential function, 183
coefficient of variation, 252 exponential pdf, 132
expected value, 195 gamma pdf, 134
kurtosis, 252 Gaussian pdf, 133
mean, 195 general transformation, 183
skewness, 252 inverse, 181
sum, 455 joint pdf, 257
variance, 195 Laplace pdf, 132
Poisson limit theorem, 221 limit, 458
Poisson points, 223 linear function, 180
Polar quantizer, 331 logistic pdf, 134
Polya distribution, 129 log-normal pdf, 183
Population mean, 371 Lorentz pdf, 133
Population variance, 371 magnitude, 184
Positive definite, 293, 355 marginal pdf, 256
Positive semi-definite, 200, 292 normal pdf, 133
492 Index

one-to-one transformation, 179, 274 Random experiment, 99


Poisson pdf, 222 model, 99
product pdf, 258 observation, 99
Rayleigh pdf, 133 procedure, 99
rectangular pdf, 131 Random number generation, 187
sign, 186 exponential distribution, 245
sine, 185, 242 Rayleigh distribution, 188
square, 184 Random Poisson points, 223
square root, 181 Random process, 256
standard exponential pdf, 132 Random sample, 371
standard normal pdf, 157 Random sum, 429
tangent, 245 expected value, 431
transformation finding, 187 mgf, 430
uniform pdf, 131 variance, 431
unimodal pdf, 397 Random variable, 161, 162
Probability distribution, 106 binomial random variable, 218
Probability distribution function, 165 continuous random variable, 162, 169
Probability function, 106, 165 convergence, 415
convergence, 443 discrete random variable, 162, 169
Probability mass, 93 exponential random variable, 224
Probability mass function (pmf), 125, 169 finite random variable, 162
Bernoulli pmf, 126 function of a random variable, 174
binary pmf, 126 Gaussian random variable, 217
binomial pmf, 127 joint random variables, 255
conditional pmf, 208 limit, 415
geometric pmf, 127 mixed-type random variable, 162, 169
joint pmf, 259 multinomial random variable, 316
marginal pmf, 256 negative exponential random variable,
multinomial pmf, 259 246
negative binomial pmf, 129 normal random variable, 217
Pascal pmf, 129 Poisson random variable, 222, 297, 453
Poisson pmf, 128, 222 random sum, 429
Polya pmf, 129 sum, 426
sign, 186 uniformly bounded, 435
two-point pmf, 126 variable sum, 429
uniform pmf, 127 Random vector, 255
Probability measure, 105, 106, 126 bi-variate Gaussian random vector, 339
mixed probability measure, 137 bi-variate normal random vector, 339
Probability space, 107 bi-variate random vector, 260
Product, 6 bi-variate t random vector, 383
Product of two random variables, 278 cf, 295
Product pdf, 258 complex random vector, 290
Product space, 100 continuous random vector, 255
Proper subset, 3 discrete random vector, 255
Gaussian random vector, 337, 338
hybrid random vector, 256
Q i.i.d. random vector, 269
Quality control, 99 independent random vector, 266
Quantile, 189 joint cf, 295
joint mgf, 296
linearly dependent random vector, 293
R linearly independent random vector, 293
Radius, 24 linear transformation, 293
Index 493

mixed random vector, 256 Series connection, 124


multi-dimensional Gaussian random Set, 1
vector, 338 additive class of sets, 96
multi-dimensional normal random vec- Borel set, 104, 146
tor, 338 Cantor set, 16
multi-dimensional random vector, 255 Cantor ternary set, 16
multi-variable random vector, 255 class of sets, 4
multi-variate random vector, 255 collection of sets, 4
normal random vector, 337, 338 complement, 5
several independent random vectors, 270 convergence, 62
two-dimensional normal random vector, countable set, 12
339 denumerable set, 12
two independent random vectors, 269 difference, 8
two uncorrelated random vectors, 293 elementary set, 144
uncorrelated random vector, 292 empty set, 3
Range, 19 enumerable set, 12
Rank statistic, 315 equivalence, 17
Rate, 132, 224 family of sets, 4
Ratio of two normal random variables, 342 fat Cantor set, 17
Ratio of two random variables, 279 finite set, 3
Rayleigh distribution, 133, 252 fuzzy set, 2
cdf, 170 index set, 100
moment, 249 infinite set, 3
random number generation, 188 intersection, 6
Rectangle, 144 interval, 4
Rectangular distribution, 131 interval set, 4
Relative frequency, 115 limit set, 57, 62
Reproducing property, 38 lower bound set, 59
Residue theorem, 73 lower limit, 59
Riemann integral, 131, 148, 168 measurable set, 101
Riemann-Stieltjes integral, 79, 168 non-denumerable set, 15
Rising factorial, 48 non-measurable set, 101, 149
Rising sequential product, 48 null set, 3
Rotation, 328 open set, 146
point set, 2
power set, 4
S product, 6
Sample, 371 proper subset, 3
random sample, 371 set of integers, 3
Sample mean, 371 set of measure zero, 17, 110
expected value, 371 set of natural numbers, 3
mgf, 410 set of rational numbers, 12
symmetric distribution, 376 set of real numbers, 3
variance, 371 set of sets, 4
Sample point, 100 singleton set, 2
Sample space, 100 Smith-Volterra-Cantor set, 17
continuous sample space, 100 subset, 2
discrete sample space, 100 sum, 5
mixed sample space, 100 symmetric difference, 8
Sample variance, 372 uncountable set, 12, 15
symmetric distribution, 376 union, 5
variance, 373 universal set, 1
Sequence space, 100 upper bound set, 60
494 Index

upper limit, 60 Span, 231


Vitali set, 105, 107, 150 Stable distribution, 400, 438
Set function, 19, 93 symmetric α-stable, 400
Set of sets, 4 Standard deviation, 194
Several independent random vectors, 270 Standard Gaussian distribution, 133
Sharp peak, 205 Standard normal distribution, 133
Sifting property, 38 cdf, 217
Sigma algebra Standard symmetric α-stable distribution,
generated from G , 98 400
Sigma algebra (σ-algebra), 96 Statistic, 371
Sigma field, 96 order statistic, 135, 285, 315, 395
generated from G , 98 rank statistic, 315
Sign, 186 sign statistic, 186
cdf, 186 Statistical average, 190
expected value, 250 Statistically independent, 123
mean, 250 Step-convergent sequence, 33
pdf, 186 Step function, 33
pmf, 186 Fourier transform, 87
variance, 250 Step-like function, 169, 229
Sign statistic, 186 Stepping stones, 74
Simple function, 147 Stirling approximation, 237
Simple zero, 43 Stirling number, 251
Sine, 185 second kind, 251
pdf, 185 Stirling’s formula, 237
Singleton class, 4 Stochastic average, 190
Singleton family, 4 Stochastic convergence, 420
Singleton set, 2 Stochastic process, 256
Singular function, 29, 32, 169 Strictly decreasing sequence, 57
convolution, 32 Strictly increasing sequence, 57
increasing singular function, 31 Strong law of large numbers, 434
Skewness, 205 Borel’s strong law of large numbers, 435
binomial distribution, 252 Kolmogorov’s strong law of large num-
exponential distribution, 205 bers, 436
geometric distribution, 205 Subset, 2
negative binomial distribution, 205 proper subset, 3
Poisson distribution, 252 Sum, 5
Small o, 219 Bernoulli random variable, 428
Smallest event space, 102 binomial random variable, 429
Smith-Volterra-Cantor set, 17 Cauchy random variable, 455
Space, 1 central chi-square random variable, 378
abstract space, 1 cf, 428
continuous sample space, 100 expected value, 426
discrete combined space, 100 exponential random variable, 429
discrete sample space, 100 gamma random variable, 455
event space, 101 Gaussian random variable, 429
finite dimensional vector space, 100 geometric random variable, 455
infinite dimensional vector space, 100 mgf, 428
measurable space, 105, 147 negative binomial random variable, 455
measure space, 147 non-central chi-square random variable,
mixed sample space, 100 379
probability space, 107 normal random variable, 429
product space, 100 Poisson random variable, 455
sequence space, 100 random sum, 429
Index 495

variable sum, 429 Total probability theorem, 120, 210


variance, 426 Transcendental number, 85
Sum of two random variables, 276, 282 Transform
Support, 24 Fourier transform, 199
Sure convergence, 415 inverse Fourier transform, 87
Surjection, 21 Laplace transform, 202
Surjective function, 21 Transformation, 179, 187
Symbolic derivative, 36 coordinate transformation, 405
impulse function, 41 linear transformation, 274
max, 41 one-to-one transformation, 179, 274
min, 41 Transformation finding, 187
Symbolic differentiation, 36 Transformation Jacobian, 272
Symmetric α-stable distribution, 400 Triangle inequality, 454
bi-variate isotropic SαS distribution, 402 Two independent random vectors, 269
standard symmetric α-stable distribu- Two-point distribution, 126
tion, 400 Two-point pmf, 126
Symmetric channel, 156 Type 1 discontinuity, 27
Symmetric difference, 8 Type 2 discontinuity, 27
Symmetric distribution, 376
sample mean, 376
sample variance, 376 U
Symmetry parameter, 400 Uncorrelated, 289, 350
Uncorrelated random vector, 292
Uncountable set, 12, 15
T Uniform continuity, 26, 200
Tail integral, 217 Uniform distribution, 127, 131
Tail probability inequality, 448 expected value, 194
Tangent, 245 mean, 194
pdf, 245 median, 190
Taylor approximation, 459 mode, 190
Taylor series, 208 variance, 194
Test function, 41 Uniformly bounded, 435
Theorem, 3 Unimodal pdf, 397
addition theorem, 118 Union, 5
ballot theorem, 155 Union bound, 154
Bayes’ theorem, 121, 154, 212, 303 Union upper bound, 154
Borel-Cantelli lemma, 141 Unitary matrix, 293
Cauchy’s integral theorem, 73 Unit step function, 33
central limit theorem, 217, 438, 444 discrete space, 36
de Moivre-Laplace theorem, 220 Universal set, 1
equivalence theorem, 18 Upper bound, 55
Fubini’s theorem, 82 least upper bound, 55
Gauss’ hypergeometric theorem, 69 Upper bound set, 60
Heine-Cantor theorem, 27 Upper factorial, 48
Kronecker lemma, 435 Upper limit, 60
Lebesgue decomposition theorem, 230
moment theorem, 206
multiplication theorem, 118 V
Poisson limit theorem, 221 Vandermonde convolution, 69
Price’s theorem, 358 Variable sum, 429
residue theorem, 73 expected value, 431
total probability theorem, 120, 210 mgf, 430
Thomae’s function, 28 variance, 431
496 Index

Variance, 194 non-central t distribution, 383


beta distribution, 245 normal distribution, 195
binomial distribution, 195, 207 Poisson distribution, 195
Cauchy distribution, 196 sample mean, 371
central chi-square distribution, 378 sample variance, 372, 373
central F distribution, 384 sign, 250
central t distribution, 410 uniform distribution, 194
conditional variance, 308 Variance-covariance matrix, 290
double exponential distribution, 245 Venn diagram, 5
exponential distribution, 194 Vitali set, 105, 107, 150
gamma distribution, 251
Gaussian distribution, 195
geometric distribution, 250
hypergeometric distribution, 251 W
magnitude, 250 Weak convergence, 421
negative binomial distribution, 250 Weak law of large numbers, 432
non-central chi-square distribution, 379 With probability 1, 17, 110
non-central F distribution, 388 With probability 1 convergence, 416

You might also like