[go: up one dir, main page]

0% found this document useful (0 votes)
54 views15 pages

2022 - Week - 2 - Ch.2 RV and Stochastic Prob

Download as docx, pdf, or txt
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 15

2.

Random Variables and Stochastic Process

2.1 Random Variables


n
Def. 2.1. Given a probability space, (Ω, A , P), a random variable X ( ∙ ) :Ω→ R is a
real-(vector-) valued point function which carries a sample point, ω ∈ Ω , into a point

x ∈ Rn in such a way that every sets, A ⊂Ω , of the form

A={ω : X ( ω ) ≤ x , x ∈ Rn }
is an element of the σ −algebra A

 In the textbook, mis spelled. As y ∈ Rn

 Random variable

- is a function such that ∀ A ∈ A associates to a real number x

 Wiki: https://en.wikipedia.org/wiki/Random_variable

Ex.2.4. The Experiment: two coins flips

The sample space Ω={ HH , HT ,TH , TT }

The Event: at least one Head: A={HH , HT , TH }

A candidate random variable X (ω)

X ( ω )= {0 otherwise
1ω∈ A

We may call this as the Indicator random variable I H (ω)

Now the event generated by X ( ω ) may be defined (as you like)


A1= { ω : X ( ω ) ≤−1 }=ϕ A2= { ω : X ( ω ) ≤ 0 } ={ TT } A3 ={ ω : X ( ω ) ≤ 1 }=Ω

Probability graph in x
3/4

%%% Kim’s comment: 1/4

What is σ − Algebra in this case? Well A contains Ω∧ϕ

And 0 1
A={Ω , ϕ , {TH , HT , HH }, {TT }} ,

Hence we may calculate for any A ∈ A , P( A) %%%%

Def. 2.7. In general, A ⊂Ω is an atom of A if A ∈ A and no subset of A is an


element of A other than A and ∅ .

Hence { TH , HT , HH } , {TT } are atoms of A

%%% Kim’s comment :

See the notation A ⊂Ω , A ∈ A %%%

2.2 Prob. Distribution Function

 Probability Distribution function F ( x)

F ( x )=F X ( x )=P ( { ω : X ( ω ) ≤ x } )

Def.2.10. A real random variable X ( ω) is said to be discrete if there exists a finitely


countable set S= { x j } such that

∑ P ( {ω : X ( ω )=x j })=1.
x j ∈S

2.3 Prob. Density Function

Suppose ∃ f ( x ) such that


b
F ( b )=∫ f X ( x)dx
−∞

Then f ( x ) is the probability density function, f X ( x )

Proposition 2.12.

 f ( x ) ≥0 , ∀ x (since F ( x ) isnondecreasing )

 ∫ f ( x ) dx=1=F (∞)
−∞

x2

 P ( { ω : x1 < X ( ω ) ≤ x 2 }) =F ( x 2 )−F ( x1 ) =∫ f ( x ) dx
x1

 P ({ ω : x1 ≤ X ( ω ) < x 2 })=F ¿

 P ({ ω : x1 < X ( ω ) < x 2 })=F ¿

 If F ( x ) is continuous at a point x o , then P ( {ω : X ( ω )=x 0 })=0

 If F ( x ) is discontinuous at a point x o , then


P ({ ω : X ( ω ) =x0 } )=F ( x 0 )−F ¿

%%% Kim’s comment : in the limit notation

F¿
F ¿∧not include x ≠ a
%%%

%%% kim’s comment on δ (x)

One of the special function in mathematics is δ ( ) function. The definition of δ (x) is


∫ f ( x ) δ ( x ) dx=f ( 0 ) (1)
−∞

In the shift form


∫ f ( x ) δ ( x−a ) dx=f (a)


−∞

What is the value of δ ( x)? It may be called as an impulse function. As you see
∞ a a

∫ f ( x ) δ ( x ) dx=∫ f ( x ) δ ( x ) dx=lim ∫ f ( x ) δ ( x ) dx=f (0)


a →0 −a
−∞ −a

Now for any constant A , which is as large as possible


a
lim ∫ f ( x ) Adx=0 , ∀ A
a→ 0 −a

So to satisfy (1), the magnitude of δ ( x )=∞ at x=0 , which is not defined at the real number.So in
fact the delta function is not a function. We may remember in the system theory, the Laplace
transform of the delta function, i.e.,

∫ δ ( t ) e−st dt =e−0 =1
0

%%%%

 Common (Probability) Distribution Functions for Random Variables

1) The uniform distribution function

x−a
F ( x )=
b−a
1
f ( x )=
b−a
2) The exponential distribution function

F ( x )=1−e−λx
−λx
f ( x )= λ e
3) The Gaussian probability distribution(the normal distribution function)
2
− ( x−m )
1 2σ
2

f ( x )= e
σ √2 π
x
F ( x )= ∫ f ( y ) dy
−∞

T
A Gaussian random vector x=( x 1 , x 2 , … , x n ) , the density is

f ( x )=
1
n
( 2 π ) |P|
2
1
2
exp [ −1
2
( x−m )T P−1( x−m) ]
where m: the mean vector,

P : the covariance matrix,


|P|: the determinant of P

2.4 Probabilistic Concepts Applied to Random Variables

 Joint Probability Distribution

F ( x 1 , x 2 , … , x n ) =P ( X 1 ≤ x 1 , X 2 ≤ x 2 , … , X n ≤ x n )

 Marginal Probability

F ( x 1 , x 2 , … , x k ) =P( X 1 ≤ x 1 , X 2 ≤ x 2 , X k ≤ x k , ∞ , … , ∞)

 Joint Probability Density


n

f ( x1 , x2 , … , xn )= F ¿ X = x , … X =x
∂ x1 ∂ x2 , … ∂ xn 1 1 n n

 Marginal Probability Density


n

f ( x1 , x2 , … , xk )= F ¿ X = x , … X =x
∂ x1 ∂ x2 , … ∂ xn 1 1 k k

 The marginal probability distribution


x1 xk ∞ ∞
F ( x 1 , x 2 , … , x k ) =∫ … ∫ … ∫ … ∫ f ( s 1 , s2 , … s n ) d s1 … d s n
−∞ −∞ −∞ −∞

%%% Kim’s Example

Given ( X , Y ) , PDF f ( x , y ) find the marginal f ( x ) , f ( y )


∞ ∞
f ( x )= ∫ f ( x , s ) ds , f ( y )= ∫ f ( s , y ) ds
−∞ −∞

Ok. That is, by definition, a continuous case. In discrete case

Given a group of people, there are two experiments as measurement of temperature and
blood pulse rate. Let us denote ω as one person

T H = { ω :high temperature } ,T L= { ω :low temperature }

P H ={ ω :high pulse rate } , P L ={ω :low pulserate }

The probabilities of the outcomes are

P ( T H ∩ P H }=0.4 , P ( T L ∩ PH ) =0.2

P ( T H ∩ P L }=0.3 , P ( T L ∩ P L )=0.1

Then the marginal probabilities are

P ( T H )=P ( T H ∩ P H ) + P ( T H ∩ P L ) =0.4+0.3=0.7

And

P ( T L ) =P ( T L ∩ P H )+ P ( T H ∩ P L ) =0.2+0.1=0.3

You may compare the continuous case. %%%

%%% Kim’s Example for continuous case


Give the joint probability as

{
f ( x , y )= 2, 0< x , 0< y , x + y <1
0 otherwise
1) Is it a PDF ( or CDF) ?

One the necessary condition is


1.5

0.5
dx
dy

0
-0.5
-0.5 0 0.5 1 1.5

∞ ∞ 1

∫ ∫ f (x , y )dxdy =∫ ¿ ¿
−∞ −∞ 0

2) Find marginal probabilities f ( x ) , f ( y )


∞ 1− y
f ( y ) =∫ f ( x , y )dx= ∫ 2 dx=2 ( 1− y ) , 0< y <1
−∞ 0

∞ 1−x
f ( x )= ∫ f (x , y ) dy= ∫ 2 dy=2 ( 1−x ) , 0< x< 1
−∞ 0

3) Is ( X , Y ) independent? No since
2=f ( x , y ) ≠ f ( x ) f ( y )=4 (1−x)(1− y)
 HA_2_1 :

{
f ( x , y )= 2 x + y=1 , 0< x <2 , 0< y <1
0 otherwise
%%%

f(x,y)
1

1) Is this a PDF? 2 x

2) Find marginal probabilities f ( x )∧f ( y )

%%%

Def 2.16. Two random variables X and Y are called independent if any event of the
form X ( ω ) ∈ A id independent of any event of the form Y ( ω ) ∈ A where A , B are
n
sets in R

 Fact

P ( X ∈ A , Y ∈ B )=P ( X ∈ A ) P (Y ∈ B )
 The joint probability distribution

F ( x , y ) =P ( X ≤ x ,Y ≤ y )=P ( X ≤ x ) P ( Y ≤ y )=F ( x ) F ( y )
 The joint probability density function

∂2 ∂ ∂ ∂F ∂F
( )
f XY x , y = F ¿ X = x, Y = y¿ F ¿ X = x F ¿Y = y ¿ ¿X = x ¿
∂ x∂ y ∂ x ∂ y ∂ x ∂ y Y=y
¿ f X ( x) f Y ( y )

2.5 Functions of a Random Variable -skip

Y ( ω ) =g ( X ( ω ) ) has the density function

X ( ω ) f Y ( y )=f X ( g ( y ) ) ∨J ( y )∨¿
−1

Where ¿ J ( y )∨¿ stands for the absolute value of the determinant of the matrix

[ ]
∂ g−1
1 ∂ g−1
n

∂ y1 ∂ y1
J ( y )= ⋮ ⋱ ⋮
−1 −1
∂ g1 ∂ gn

∂ yn ∂ yn Y=y

2.6 Expectations and Moments of a Random Variable

Def.

 The mean

E [ X ] = ∫ xf ( x ) dx (a)
−∞

 The sample mean


n
1
mk = ∑ X (1)
n k=1 k
%% The sample mean is a Random Variable! It is an estimator of the mean of a
random variable X . If X k is an independent identical distributed (iid) random
variable,i.e.,

E ( X k )=m ∀ k ,

Then the mean of the sample mean is

( )
n n
1 1 1
E ( mk ) =E ∑
n k =1
X k = ∑ E ( X k )= ( nm ) =m(b)
n k=1 n

%%%Kim’s Comment

What is the difference between a) and b)? In order to use (a) , it is needed
know the probability density function, whereas in (b), not needed.

%%%
Examp. 2.19. X is uniformly distributed from 0 to 1,i.e.,

{
f ( x )= 1 ,0 ≤ x ≤ 1
0 , otherwise
Then
∞ 1
1
E ( X ) =∫ xf ( x ) dx=∫ xdx=
−∞ 0 2

Examp. 2.22 The expectation of the value of one roll of one die?

Properties

1) The operator of expectation is linear

E ( aX +bY )=aE ( X )+ bE(Y )


2) The square mean / second moment

E ( X ) =∫ x f (x ) dx
22

−∞

3) The higher order moment



E ( X n )=∫ x n f (x) dx
−∞

4) The variance

var ( X )=E [( X−E ( X )2) ]=E ( X 2 )−E ( X )2

5) The standard deviation

σ X = √ var (X )

6) The sample variance


n
1
σ =
2
n ∑
n−1 k=1
( X k −mn )
2

2
This is a random variable. And the unbiased estimator of σ X

%%% Kim’s comment: biased and un-biased estimator

What is the estimator? Let X be a RV. I want to find a constant “C” as RV in some sense.

We may call C as an estimator of the RV X . So there may be many estimators as you like.

We may classify the estimator as

1) Unbiased estimator / biased estimator

If C=E( X) , then C is the unbiased estimator, otherwise the biased estiamtor

2) The minimum variance estimator /the least square error estimator


2
C=min E ( X −a )
a

3) The mean of X is the minimum variance estimator / the least square error estimator.

min E ( X −a )2=E ( X ) (c)


a

Proof:

d
( E ( X−a )2 )= d ( E X 2 +a 2−2 aE ( X ) ) =2 a−2 E ( X )
da da
 a=E( X) , which minimizes the (c).

Examp. 2.24 The uniform distributed random variable X , [ 0 1 ]


1
1
E ( X ) =∫ x dx=
2 2

0 3

The Variance is

1 1 1
var ( X )=E ( X ) −E ( X ) = − =
2 2
3 4 12
2.7 Characteristic Functions -skip

Lemma 2.27

1 n
n
d ϕX (υ )
j
E [ X ]=
n
|ν=0
d υn
Prop.2.28 If X is a Gaussian random vector with mean, m, and covariance matrix P,
then its characteristic function is

( T 1 T
ϕ X ( υ )=exp j υ m− υ Pυ .
2 )
%%% Kim’s comment : correlation

Def : Two R.V. are uncorrelated if

Cov ( X , Y )=E ( X−E ( X ) )( Y −E ( Y ) )=E ( XY ) −E ( X ) E (Y )=0

%%%

Fact: Two Gaussian R.Vectors are uncorrelated if Cov ( X , Y ) is a diagonal matrix

 Prop. 2.29. Uncorrelated Gaussian random variables are independent

Theorem 2.30. If X is a Gaussian random vector with mean m X , and covariance, P X ,


and if Y =CX +V , where υ is a Gaussian random vector with zero mean and
covariance, PV , then Y is a Gaussian random vector with mean, C mX , and covariance,
T
C P X C + PV .

 Theorem 2.30

A R.V X N (m x , P X ) , another R.V. V N ( 0 , PV ) and they are independent. Find mean and
covariance of Y =CX +V

%%% Kim’s comment : Characteristic function is difficult to remember. In the text book, using
the characteristic method. In this case we may apply basic theory.

Sol: Let’s apply the basic definition.

m y =E ( Y ) =E (CX +V ) =E (CX ) + E ( V )=CE ( X )+ E ( V ) =C mx

PY =E [ ( Y −m y ) ( Y −m y )T ]=E [ Y Y T ]−m y mTy =E [ ( CX +V )( CX +V )T ]−C mx mTx CT

E [ ( CX+ V )( CX +V ) ]=E [ CX X C +CX V +V CX +V V ]


T T T T T T

¿ E¿
¿ E¿
¿ E¿
Hence

PY =E [ ( Y −m y ) ( Y −m y )T ]=E ¿

¿ C P X C T + PV
- In general, independency implies the uncorrelated, not vice versa

- However, in Gaussian Does satisfy the opposite direction. %%%

%%% Kim’s comment: covariance matrix

Sometimes, but most case in this course, we may deal with a random vector whose
components are random variables , i.e.

X =( x , y , z ) is a random vector, its components are random variables ( x , y , z ) Then the


covariance of X random vector is defined ad

[ ]
cov ( x , x ) co v( x , y) cov (x , z)
Cov ( X )= cov ( x , y) cov ( y , y ) cov ( y , z)
cov (x , z) cov ( y , z ) cov (z , z )

where

cov ( x , y )=E ¿
hence by definition

E¿
Therefore the matrix Cov ( X ) is a symmetric matrix, i.e.,
T
Cov ( X )=[ Cov ( X ) ]

The diagonal terms of the covariance matrix are variance of each random variable

%%%

 The covariance of a uncorrelated (so independent) Gaussian is a diagonal matrix,

[ ]
σx 0 0
PX= 0 σ y 0
0 0 σz

%%% Kim’s comment :Linear matrix theory: similar transform

For any semi-positive symmetric matrix M , there is a similar transform matrix S such
that

diag ( Λ )=SM S−1=SM ST


Hence the covariance P X for any gaussian Random vectors (correlated), there exits a S
such that

diag ( Λ X )=S P X ST

 Any Gaussian Random vectors, we can find a transformed Random Vectors which is
uncorrelated (independent).

 Independency is important to calculate the probability. You know the Gaussian


probability table, but it is a scalar. So it you want to calculate the joint probability
which may be correlated, first find a similar transform matrix to generate a diagonal
covariance matrix. Then you may calculate the joint probability as a separate
probability.

%%%

 The central limit theorem

Theorem 2.31. Let X 1 , … , X n be i.i.d. random variables with finite mean and variance,
n
E [ X k ] =m< ∞ , E [ ( X k −m) ] =σ < ∞, and denote their sum as Y n ≔∑ X k. Then the
2 2

k=1
distribution of the normalized sum

Y n−E [ Y n ] Y n−nm
Zn≔ =
√ var ( Y n ) σ √n

is a Gaussian distribution with mean 0 and variance 1 in the limit as n → ∞

- Proof : textbook P.52

- Remarks:
[ ]
1) See, the condition, E [ X k ] =m< ∞ , E ( X k −m ) =σ < ∞, that means
2 2

the mean and the variance is constant, but the experiment is many time
processing. For example,

a) A die, which is fair or not, you roll the same die many times. Then
Yn 1 n
the mean of the sum ( = ∑ X k) is a Gaussian if n−→ ∞.
n n k=1
2) Some RV has no mean, then it will not be applicable.

2.8 Conditional Expectations and Conditional Probabilities

 The conditional expectation



E [ X|Y ]= ∫ x f ( x| y ) dx
−∞

f (x , y)
f ( x| y )=
f ( y)
 Remarks

- E [ X ] is a constant, means it is not random variable.


- E [ X| y =2 ] if y is a constant, then E [ X| y =2 ] is a constant

- E [ X|Y ] if Y is a RV, then E [ X|Y ] is a Random Variable of y


 Iterated expectation (See the proof at p.57 and remember)

E [ X ] =E [ E [ X|Y ] ]
%%% Kim’s comment

E [ X ] =E X [ X ] −→ need f X ( x)

E X [ X ] =EY [ E X [ X|Y ] ]−→ need f Y ( y ) , E x [ X|Y ] (¿ f X ∨Y ( x| y ) )

Even if we do not know f X ( x ).

I should say, this formula cannot emphasize too much! This very simple fact use diverse
applications, big data, machine learning, and dynamic system analysis. We should remember this.

%%%

 Lemma 2.34.

1) E [ X|Y ]=E [ X ] if X ∧Y areindependent

2) E [ g ( Y ) X|Y ]=g ( Y ) E [ X∨Y ]

2.9 Stochastic Process


 Def. 2.36. A stochastic process is a family of random variables, X ( ω , t ),
indexed by a real parameter t ∈ T and defined on a common probability
space ( Ω, A , P).

%%% Kim’s comment

A stochastic process (or random process) is a time varying random variable, i.e., for any fixed t ,
the process is a random variable.

%%%

 Ex. 2.37

X ( ω , t )= A ( ω ) sin t , A ( ω ) ∈ U [−1,1 ]
 Def. 2.38.

1) A stochastic process X (ω , t) is said to be continuous in probability at t if

lim P (¿ ω :|X ( ω ,t )−X ( ω , s )|≥ ϵ )=0 ¿


s →t

for all ϵ >0

2) Skip: A stochastic process X ( ω , t ) is said to be separable if there exists a


countable, dense set S ⊂ T such that for any closed set K ⊂ [ −∞ , ∞ ] the two sets

A 1 = { ω : X ( ω , t ) ∈ K ∀ t ∈ T } , A 1= { ω : X ( ω , t ) ∈ K ∀ t ∈ S } ,

differ by a set A0 such that P ( A 0 ) =0.

 Skip: Theorem 2.40. The rational numbers in T provide a separating set S.

 Def. 2.42. Let X be a random process defined on the time interval, T. Let

t 0 <t 1 …<t n be a partition of the time interval, T. If the increments,


X ( t k )− X ( t k−1 ) , are mutually independent for any partition of T, then X is said to
be a process with independent increments.

 Def. 2.43 We say that a random process, X, is a Gaussian process if for every
finite collection, X t 1 , X t2 , … , X tn ,the corresponding density function,

f (x 1 , x 2 , … , x n)

is a Gaussian density function.

 Def. 2.44 We say that a random process X is a Gaussian process if every finite
linear combination of the form
N
Y =∑ α j X (t j )
j=1

is a Gaussian random variable

 Def 2.45. A random process{ X t , t ∈T }, where T is a subset of the real line, is


said to be a Markov process if for any increasing collection
t 1< t 2 <, … ,<t n ∈ T

P ( X t < x n| X t =x n−1 ,… , X t =x 1 ¿=P( X t ≤ x n∨ X t =x n−1)


n n−1 1 n n−1

or, equivalently

F ( X |X
tn t1 ,… , X t n−1 )(
x n|x 1 ,… , x n−1 )=F ( X |X ) ( x n|x n−1) .
tn t n−1

2.10 Gauss-Markov Processes – The fundamental

1) Dynamics

x k+1=Φk x k + wk ( 2.36)

- State x k , Φ k is a known matrix, w k is a Gaussian Random sequence.

2) Given Conditions

a) Noise

E [ w k ]=w k
T
E {( wk −w k ) ( wl−wl ) =W k δ kl

where

δ kl = {10,,k=l
k≠l

b) The states

E [ x 0 ] =x 0

E [ ( x 0−x 0 ) ( x 0−x 0 ) ]=P 0


T

c) The correlation

E [ ( x 0−x 0 ) ( wk −wk ) ]=0 ∀ k ,( 2.37)


T

which implies

E [ ( x k − xk ) ( w j−w j )T ]=0 ∀ j ≥ k

3) The mean and covariance

 The mean

x k+1 =Φk x k + w k ( 2.38)

 The covariance

P K+1 =Φk Pk Φ Tk +W k
2.11 Non-linear Stochastic Difference Equations  skip

You might also like