2022 - Week - 2 - Ch.2 RV and Stochastic Prob

2.
Random Variables and Stochastic Process
2.1 Random Variables

n
Def. 2.1. Given a probability space, (Ω, A , P), a random variable X ( ∙ ) :Ω→ R is a
real-(vector-) valued point function which carries a sample point, ω ∈ Ω , into a point
x ∈ Rn in such a way that every sets, A ⊂Ω , of the form
A={ω : X ( ω ) ≤ x , x ∈ Rn }
is an element of the σ −algebra A
 In the textbook, mis spelled. As y ∈ Rn
 Random variable
- is a function such that ∀ A ∈ A associates to a real number x
 Wiki: https://en.wikipedia.org/wiki/Random_variable
Ex.2.4. The Experiment: two coins flips
The sample space Ω={ HH , HT ,TH , TT }
The Event: at least one Head: A={HH , HT , TH }
A candidate random variable X (ω)
X ( ω )= {0 otherwise
1ω∈ A
We may call this as the Indicator random variable I H (ω)
Now the event generated by X ( ω ) may be defined (as you like)

A1= { ω : X ( ω ) ≤−1 }=ϕ A2= { ω : X ( ω ) ≤ 0 } ={ TT } A3 ={ ω : X ( ω ) ≤ 1 }=Ω
Probability graph in x
3/4
%%% Kim’s comment: 1/4
What is σ − Algebra in this case? Well A contains Ω∧ϕ
And 0 1
A={Ω , ϕ , {TH , HT , HH }, {TT }} ,
Hence we may calculate for any A ∈ A , P( A) %%%%
Def. 2.7. In general, A ⊂Ω is an atom of A if A ∈ A and no subset of A is an

element of A other than A and ∅ .
Hence { TH , HT , HH } , {TT } are atoms of A
%%% Kim’s comment :
See the notation A ⊂Ω , A ∈ A %%%
2.2 Prob. Distribution Function
 Probability Distribution function F ( x)
F ( x )=F X ( x )=P ( { ω : X ( ω ) ≤ x } )
Def.2.10. A real random variable X ( ω) is said to be discrete if there exists a finitely

countable set S= { x j } such that
∑ P ( {ω : X ( ω )=x j })=1.
x j ∈S
2.3 Prob. Density Function
Suppose ∃ f ( x ) such that

b
F ( b )=∫ f X ( x)dx
−∞
Then f ( x ) is the probability density function, f X ( x )
Proposition 2.12.
 f ( x ) ≥0 , ∀ x (since F ( x ) isnondecreasing )
∞
 ∫ f ( x ) dx=1=F (∞)
−∞
x2
 P ( { ω : x1 < X ( ω ) ≤ x 2 }) =F ( x 2 )−F ( x1 ) =∫ f ( x ) dx
x1
 P ({ ω : x1 ≤ X ( ω ) < x 2 })=F ¿
 P ({ ω : x1 < X ( ω ) < x 2 })=F ¿
 If F ( x ) is continuous at a point x o , then P ( {ω : X ( ω )=x 0 })=0
 If F ( x ) is discontinuous at a point x o , then

P ({ ω : X ( ω ) =x0 } )=F ( x 0 )−F ¿
%%% Kim’s comment : in the limit notation
F¿
F ¿∧not include x ≠ a
%%%
%%% kim’s comment on δ (x)
One of the special function in mathematics is δ ( ) function. The definition of δ (x) is

∞
∫ f ( x ) δ ( x ) dx=f ( 0 ) (1)
−∞
In the shift form

∞
∫ f ( x ) δ ( x−a ) dx=f (a)

−∞
What is the value of δ ( x)? It may be called as an impulse function. As you see
∞ a a
∫ f ( x ) δ ( x ) dx=∫ f ( x ) δ ( x ) dx=lim ∫ f ( x ) δ ( x ) dx=f (0)

a →0 −a
−∞ −a
Now for any constant A , which is as large as possible

a
lim ∫ f ( x ) Adx=0 , ∀ A
a→ 0 −a
So to satisfy (1), the magnitude of δ ( x )=∞ at x=0 , which is not defined at the real number.So in
fact the delta function is not a function. We may remember in the system theory, the Laplace
transform of the delta function, i.e.,
∞
∫ δ ( t ) e−st dt =e−0 =1
0
%%%%
 Common (Probability) Distribution Functions for Random Variables
1) The uniform distribution function
x−a
F ( x )=
b−a
1
f ( x )=
b−a
2) The exponential distribution function
F ( x )=1−e−λx
−λx
f ( x )= λ e
3) The Gaussian probability distribution(the normal distribution function)
2
− ( x−m )
1 2σ
2
f ( x )= e
σ √2 π
x
F ( x )= ∫ f ( y ) dy
−∞
T
A Gaussian random vector x=( x 1 , x 2 , … , x n ) , the density is
f ( x )=
1
n
( 2 π ) |P|
2
1
2
exp [ −1
2
( x−m )T P−1( x−m) ]
where m: the mean vector,
P : the covariance matrix,

|P|: the determinant of P
2.4 Probabilistic Concepts Applied to Random Variables
 Joint Probability Distribution
F ( x 1 , x 2 , … , x n ) =P ( X 1 ≤ x 1 , X 2 ≤ x 2 , … , X n ≤ x n )
 Marginal Probability
F ( x 1 , x 2 , … , x k ) =P( X 1 ≤ x 1 , X 2 ≤ x 2 , X k ≤ x k , ∞ , … , ∞)
 Joint Probability Density

n
∂
f ( x1 , x2 , … , xn )= F ¿ X = x , … X =x
∂ x1 ∂ x2 , … ∂ xn 1 1 n n
 Marginal Probability Density

n
∂
f ( x1 , x2 , … , xk )= F ¿ X = x , … X =x
∂ x1 ∂ x2 , … ∂ xn 1 1 k k
 The marginal probability distribution

x1 xk ∞ ∞
F ( x 1 , x 2 , … , x k ) =∫ … ∫ … ∫ … ∫ f ( s 1 , s2 , … s n ) d s1 … d s n
−∞ −∞ −∞ −∞
%%% Kim’s Example
Given ( X , Y ) , PDF f ( x , y ) find the marginal f ( x ) , f ( y )

∞ ∞
f ( x )= ∫ f ( x , s ) ds , f ( y )= ∫ f ( s , y ) ds
−∞ −∞
Ok. That is, by definition, a continuous case. In discrete case
Given a group of people, there are two experiments as measurement of temperature and
blood pulse rate. Let us denote ω as one person
T H = { ω :high temperature } ,T L= { ω :low temperature }
P H ={ ω :high pulse rate } , P L ={ω :low pulserate }
The probabilities of the outcomes are
P ( T H ∩ P H }=0.4 , P ( T L ∩ PH ) =0.2
P ( T H ∩ P L }=0.3 , P ( T L ∩ P L )=0.1
Then the marginal probabilities are
P ( T H )=P ( T H ∩ P H ) + P ( T H ∩ P L ) =0.4+0.3=0.7
And
P ( T L ) =P ( T L ∩ P H )+ P ( T H ∩ P L ) =0.2+0.1=0.3
You may compare the continuous case. %%%
%%% Kim’s Example for continuous case

Give the joint probability as
{
f ( x , y )= 2, 0< x , 0< y , x + y <1
0 otherwise
1) Is it a PDF ( or CDF) ?
One the necessary condition is

1.5
0.5
dx
dy
0
-0.5
-0.5 0 0.5 1 1.5
∞ ∞ 1
∫ ∫ f (x , y )dxdy =∫ ¿ ¿
−∞ −∞ 0
2) Find marginal probabilities f ( x ) , f ( y )

∞ 1− y
f ( y ) =∫ f ( x , y )dx= ∫ 2 dx=2 ( 1− y ) , 0< y <1
−∞ 0
∞ 1−x
f ( x )= ∫ f (x , y ) dy= ∫ 2 dy=2 ( 1−x ) , 0< x< 1
−∞ 0
3) Is ( X , Y ) independent? No since
2=f ( x , y ) ≠ f ( x ) f ( y )=4 (1−x)(1− y)
 HA_2_1 :
{
f ( x , y )= 2 x + y=1 , 0< x <2 , 0< y <1
0 otherwise
%%%
f(x,y)
1
1) Is this a PDF? 2 x
2) Find marginal probabilities f ( x )∧f ( y )
%%%
Def 2.16. Two random variables X and Y are called independent if any event of the
form X ( ω ) ∈ A id independent of any event of the form Y ( ω ) ∈ A where A , B are
n
sets in R
 Fact
P ( X ∈ A , Y ∈ B )=P ( X ∈ A ) P (Y ∈ B )
 The joint probability distribution
F ( x , y ) =P ( X ≤ x ,Y ≤ y )=P ( X ≤ x ) P ( Y ≤ y )=F ( x ) F ( y )
 The joint probability density function
∂2 ∂ ∂ ∂F ∂F
( )
f XY x , y = F ¿ X = x, Y = y¿ F ¿ X = x F ¿Y = y ¿ ¿X = x ¿
∂ x∂ y ∂ x ∂ y ∂ x ∂ y Y=y
¿ f X ( x) f Y ( y )
2.5 Functions of a Random Variable -skip
Y ( ω ) =g ( X ( ω ) ) has the density function
X ( ω ) f Y ( y )=f X ( g ( y ) ) ∨J ( y )∨¿
−1
Where ¿ J ( y )∨¿ stands for the absolute value of the determinant of the matrix
[ ]
∂ g−1
1 ∂ g−1
n
⋯
∂ y1 ∂ y1
J ( y )= ⋮ ⋱ ⋮
−1 −1
∂ g1 ∂ gn
⋯
∂ yn ∂ yn Y=y
2.6 Expectations and Moments of a Random Variable
Def.
 The mean
∞
E [ X ] = ∫ xf ( x ) dx (a)
−∞
 The sample mean

n
1
mk = ∑ X (1)
n k=1 k
%% The sample mean is a Random Variable! It is an estimator of the mean of a
random variable X . If X k is an independent identical distributed (iid) random
variable,i.e.,
E ( X k )=m ∀ k ,
Then the mean of the sample mean is
( )
n n
1 1 1
E ( mk ) =E ∑
n k =1
X k = ∑ E ( X k )= ( nm ) =m(b)
n k=1 n
%%%Kim’s Comment
What is the difference between a) and b)? In order to use (a) , it is needed
know the probability density function, whereas in (b), not needed.
%%%
Examp. 2.19. X is uniformly distributed from 0 to 1,i.e.,
{
f ( x )= 1 ,0 ≤ x ≤ 1
0 , otherwise
Then
∞ 1
1
E ( X ) =∫ xf ( x ) dx=∫ xdx=
−∞ 0 2
Examp. 2.22 The expectation of the value of one roll of one die?
Properties
1) The operator of expectation is linear
E ( aX +bY )=aE ( X )+ bE(Y )

2) The square mean / second moment
∞
E ( X ) =∫ x f (x ) dx
22
−∞
3) The higher order moment

∞
E ( X n )=∫ x n f (x) dx
−∞
4) The variance
var ( X )=E [( X−E ( X )2) ]=E ( X 2 )−E ( X )2
5) The standard deviation
σ X = √ var (X )
6) The sample variance

n
1
σ =
2
n ∑
n−1 k=1
( X k −mn )
2
2
This is a random variable. And the unbiased estimator of σ X
%%% Kim’s comment: biased and un-biased estimator
What is the estimator? Let X be a RV. I want to find a constant “C” as RV in some sense.
We may call C as an estimator of the RV X . So there may be many estimators as you like.
We may classify the estimator as
1) Unbiased estimator / biased estimator
If C=E( X) , then C is the unbiased estimator, otherwise the biased estiamtor
2) The minimum variance estimator /the least square error estimator

2
C=min E ( X −a )
a
3) The mean of X is the minimum variance estimator / the least square error estimator.
min E ( X −a )2=E ( X ) (c)

a
Proof:
d
( E ( X−a )2 )= d ( E X 2 +a 2−2 aE ( X ) ) =2 a−2 E ( X )
da da
 a=E( X) , which minimizes the (c).
Examp. 2.24 The uniform distributed random variable X , [ 0 1 ]

1
1
E ( X ) =∫ x dx=
2 2
0 3
The Variance is
1 1 1
var ( X )=E ( X ) −E ( X ) = − =
2 2
3 4 12
2.7 Characteristic Functions -skip
Lemma 2.27
1 n
n
d ϕX (υ )
j
E [ X ]=
n
|ν=0
d υn
Prop.2.28 If X is a Gaussian random vector with mean, m, and covariance matrix P,
then its characteristic function is
( T 1 T
ϕ X ( υ )=exp j υ m− υ Pυ .
2 )
%%% Kim’s comment : correlation
Def : Two R.V. are uncorrelated if
Cov ( X , Y )=E ( X−E ( X ) )( Y −E ( Y ) )=E ( XY ) −E ( X ) E (Y )=0
%%%
Fact: Two Gaussian R.Vectors are uncorrelated if Cov ( X , Y ) is a diagonal matrix
 Prop. 2.29. Uncorrelated Gaussian random variables are independent
Theorem 2.30. If X is a Gaussian random vector with mean m X , and covariance, P X ,

and if Y =CX +V , where υ is a Gaussian random vector with zero mean and
covariance, PV , then Y is a Gaussian random vector with mean, C mX , and covariance,
T
C P X C + PV .
 Theorem 2.30
A R.V X N (m x , P X ) , another R.V. V N ( 0 , PV ) and they are independent. Find mean and
covariance of Y =CX +V
%%% Kim’s comment : Characteristic function is difficult to remember. In the text book, using
the characteristic method. In this case we may apply basic theory.
Sol: Let’s apply the basic definition.
m y =E ( Y ) =E (CX +V ) =E (CX ) + E ( V )=CE ( X )+ E ( V ) =C mx
PY =E [ ( Y −m y ) ( Y −m y )T ]=E [ Y Y T ]−m y mTy =E [ ( CX +V )( CX +V )T ]−C mx mTx CT
E [ ( CX+ V )( CX +V ) ]=E [ CX X C +CX V +V CX +V V ]

T T T T T T
¿ E¿
¿ E¿
¿ E¿
Hence
PY =E [ ( Y −m y ) ( Y −m y )T ]=E ¿
¿ C P X C T + PV
- In general, independency implies the uncorrelated, not vice versa
- However, in Gaussian Does satisfy the opposite direction. %%%
%%% Kim’s comment: covariance matrix
Sometimes, but most case in this course, we may deal with a random vector whose
components are random variables , i.e.
X =( x , y , z ) is a random vector, its components are random variables ( x , y , z ) Then the

covariance of X random vector is defined ad
[ ]
cov ( x , x ) co v( x , y) cov (x , z)
Cov ( X )= cov ( x , y) cov ( y , y ) cov ( y , z)
cov (x , z) cov ( y , z ) cov (z , z )
where
cov ( x , y )=E ¿
hence by definition
E¿
Therefore the matrix Cov ( X ) is a symmetric matrix, i.e.,
T
Cov ( X )=[ Cov ( X ) ]
The diagonal terms of the covariance matrix are variance of each random variable
%%%
 The covariance of a uncorrelated (so independent) Gaussian is a diagonal matrix,
[ ]
σx 0 0
PX= 0 σ y 0
0 0 σz
%%% Kim’s comment :Linear matrix theory: similar transform
For any semi-positive symmetric matrix M , there is a similar transform matrix S such
that
diag ( Λ )=SM S−1=SM ST

Hence the covariance P X for any gaussian Random vectors (correlated), there exits a S
such that
diag ( Λ X )=S P X ST
 Any Gaussian Random vectors, we can find a transformed Random Vectors which is
uncorrelated (independent).
 Independency is important to calculate the probability. You know the Gaussian

probability table, but it is a scalar. So it you want to calculate the joint probability
which may be correlated, first find a similar transform matrix to generate a diagonal
covariance matrix. Then you may calculate the joint probability as a separate
probability.
%%%
 The central limit theorem
Theorem 2.31. Let X 1 , … , X n be i.i.d. random variables with finite mean and variance,
n
E [ X k ] =m< ∞ , E [ ( X k −m) ] =σ < ∞, and denote their sum as Y n ≔∑ X k. Then the
2 2
k=1
distribution of the normalized sum
Y n−E [ Y n ] Y n−nm
Zn≔ =
√ var ( Y n ) σ √n
is a Gaussian distribution with mean 0 and variance 1 in the limit as n → ∞
- Proof : textbook P.52
- Remarks:
[ ]
1) See, the condition, E [ X k ] =m< ∞ , E ( X k −m ) =σ < ∞, that means
2 2
the mean and the variance is constant, but the experiment is many time
processing. For example,
a) A die, which is fair or not, you roll the same die many times. Then
Yn 1 n
the mean of the sum ( = ∑ X k) is a Gaussian if n−→ ∞.
n n k=1
2) Some RV has no mean, then it will not be applicable.
2.8 Conditional Expectations and Conditional Probabilities
 The conditional expectation

∞
E [ X|Y ]= ∫ x f ( x| y ) dx
−∞
f (x , y)
f ( x| y )=
f ( y)
 Remarks
- E [ X ] is a constant, means it is not random variable.

- E [ X| y =2 ] if y is a constant, then E [ X| y =2 ] is a constant
- E [ X|Y ] if Y is a RV, then E [ X|Y ] is a Random Variable of y

 Iterated expectation (See the proof at p.57 and remember)
E [ X ] =E [ E [ X|Y ] ]
%%% Kim’s comment
E [ X ] =E X [ X ] −→ need f X ( x)
E X [ X ] =EY [ E X [ X|Y ] ]−→ need f Y ( y ) , E x [ X|Y ] (¿ f X ∨Y ( x| y ) )
Even if we do not know f X ( x ).
I should say, this formula cannot emphasize too much! This very simple fact use diverse
applications, big data, machine learning, and dynamic system analysis. We should remember this.
%%%
 Lemma 2.34.
1) E [ X|Y ]=E [ X ] if X ∧Y areindependent
2) E [ g ( Y ) X|Y ]=g ( Y ) E [ X∨Y ]
2.9 Stochastic Process

 Def. 2.36. A stochastic process is a family of random variables, X ( ω , t ),
indexed by a real parameter t ∈ T and defined on a common probability
space ( Ω, A , P).
%%% Kim’s comment
A stochastic process (or random process) is a time varying random variable, i.e., for any fixed t ,
the process is a random variable.
%%%
 Ex. 2.37
X ( ω , t )= A ( ω ) sin t , A ( ω ) ∈ U [−1,1 ]
 Def. 2.38.
1) A stochastic process X (ω , t) is said to be continuous in probability at t if
lim P (¿ ω :|X ( ω ,t )−X ( ω , s )|≥ ϵ )=0 ¿

s →t
for all ϵ >0
2) Skip: A stochastic process X ( ω , t ) is said to be separable if there exists a

countable, dense set S ⊂ T such that for any closed set K ⊂ [ −∞ , ∞ ] the two sets
A 1 = { ω : X ( ω , t ) ∈ K ∀ t ∈ T } , A 1= { ω : X ( ω , t ) ∈ K ∀ t ∈ S } ,
differ by a set A0 such that P ( A 0 ) =0.
 Skip: Theorem 2.40. The rational numbers in T provide a separating set S.
 Def. 2.42. Let X be a random process defined on the time interval, T. Let
t 0 <t 1 …<t n be a partition of the time interval, T. If the increments,

X ( t k )− X ( t k−1 ) , are mutually independent for any partition of T, then X is said to
be a process with independent increments.
 Def. 2.43 We say that a random process, X, is a Gaussian process if for every
finite collection, X t 1 , X t2 , … , X tn ,the corresponding density function,
f (x 1 , x 2 , … , x n)
is a Gaussian density function.
 Def. 2.44 We say that a random process X is a Gaussian process if every finite
linear combination of the form
N
Y =∑ α j X (t j )
j=1
is a Gaussian random variable
 Def 2.45. A random process{ X t , t ∈T }, where T is a subset of the real line, is

said to be a Markov process if for any increasing collection
t 1< t 2 <, … ,<t n ∈ T
P ( X t < x n| X t =x n−1 ,… , X t =x 1 ¿=P( X t ≤ x n∨ X t =x n−1)

n n−1 1 n n−1
or, equivalently
F ( X |X
tn t1 ,… , X t n−1 )(
x n|x 1 ,… , x n−1 )=F ( X |X ) ( x n|x n−1) .
tn t n−1
2.10 Gauss-Markov Processes – The fundamental
1) Dynamics
x k+1=Φk x k + wk ( 2.36)
- State x k , Φ k is a known matrix, w k is a Gaussian Random sequence.
2) Given Conditions
a) Noise
E [ w k ]=w k
T
E {( wk −w k ) ( wl−wl ) =W k δ kl
where
δ kl = {10,,k=l
k≠l
b) The states
E [ x 0 ] =x 0
E [ ( x 0−x 0 ) ( x 0−x 0 ) ]=P 0

T
c) The correlation
E [ ( x 0−x 0 ) ( wk −wk ) ]=0 ∀ k ,( 2.37)

T
which implies
E [ ( x k − xk ) ( w j−w j )T ]=0 ∀ j ≥ k
3) The mean and covariance
 The mean
x k+1 =Φk x k + w k ( 2.38)
 The covariance
P K+1 =Φk Pk Φ Tk +W k
2.11 Non-linear Stochastic Difference Equations  skip

2022 - Week - 2 - Ch.2 RV and Stochastic Prob

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

2022 - Week - 2 - Ch.2 RV and Stochastic Prob

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2022 - Week - 2 - Ch.2 RV and Stochastic Prob

Uploaded by

Copyright:

Available Formats

2.

Random Variables and Stochastic Process

2.1 Random Variables

x ∈ Rn in such a way that every sets, A ⊂Ω , of the form

 In the textbook, mis spelled. As y ∈ Rn

- is a function such that ∀ A ∈ A associates to a real number x

Ex.2.4. The Experiment: two coins flips

The sample space Ω={ HH , HT ,TH , TT }

The Event: at least one Head: A={HH , HT , TH }

A candidate random variable X (ω)

We may call this as the Indicator random variable I H (ω)

Now the event generated by X ( ω ) may be defined (as you like)

%%% Kim’s comment: 1/4

What is σ − Algebra in this case? Well A contains Ω∧ϕ

Hence we may calculate for any A ∈ A , P( A) %%%%

Def. 2.7. In general, A ⊂Ω is an atom of A if A ∈ A and no subset of A is an

Hence { TH , HT , HH } , {TT } are atoms of A

%%% Kim’s comment :

See the notation A ⊂Ω , A ∈ A %%%

2.2 Prob. Distribution Function

 Probability Distribution function F ( x)

Def.2.10. A real random variable X ( ω) is said to be discrete if there exists a finitely

2.3 Prob. Density Function

Suppose ∃ f ( x ) such that

Then f ( x ) is the probability density function, f X ( x )

 P ({ ω : x1 < X ( ω ) < x 2 })=F ¿

 If F ( x ) is continuous at a point x o , then P ( {ω : X ( ω )=x 0 })=0

 If F ( x ) is discontinuous at a point x o , then

%%% Kim’s comment : in the limit notation

%%% kim’s comment on δ (x)

One of the special function in mathematics is δ ( ) function. The definition of δ (x) is

In the shift form

∫ f ( x ) δ ( x−a ) dx=f (a)

∫ f ( x ) δ ( x ) dx=∫ f ( x ) δ ( x ) dx=lim ∫ f ( x ) δ ( x ) dx=f (0)

Now for any constant A , which is as large as possible

 Common (Probability) Distribution Functions for Random Variables

1) The uniform distribution function

P : the covariance matrix,

2.4 Probabilistic Concepts Applied to Random Variables

 Joint Probability Distribution

 Joint Probability Density

 Marginal Probability Density

 The marginal probability distribution

%%% Kim’s Example

Given ( X , Y ) , PDF f ( x , y ) find the marginal f ( x ) , f ( y )

Ok. That is, by definition, a continuous case. In discrete case

T H = { ω :high temperature } ,T L= { ω :low temperature }

P H ={ ω :high pulse rate } , P L ={ω :low pulserate }

The probabilities of the outcomes are

Then the marginal probabilities are

You may compare the continuous case. %%%

%%% Kim’s Example for continuous case

One the necessary condition is

2) Find marginal probabilities f ( x ) , f ( y )

2) Find marginal probabilities f ( x )∧f ( y )

2.5 Functions of a Random Variable -skip

Y ( ω ) =g ( X ( ω ) ) has the density function