2022 - Week - 2 - Ch.2 RV and Stochastic Prob
2022 - Week - 2 - Ch.2 RV and Stochastic Prob
2022 - Week - 2 - Ch.2 RV and Stochastic Prob
A={ω : X ( ω ) ≤ x , x ∈ Rn }
is an element of the σ −algebra A
Random variable
Wiki: https://en.wikipedia.org/wiki/Random_variable
X ( ω )= {0 otherwise
1ω∈ A
Probability graph in x
3/4
And 0 1
A={Ω , ϕ , {TH , HT , HH }, {TT }} ,
F ( x )=F X ( x )=P ( { ω : X ( ω ) ≤ x } )
∑ P ( {ω : X ( ω )=x j })=1.
x j ∈S
Proposition 2.12.
f ( x ) ≥0 , ∀ x (since F ( x ) isnondecreasing )
∞
∫ f ( x ) dx=1=F (∞)
−∞
x2
P ( { ω : x1 < X ( ω ) ≤ x 2 }) =F ( x 2 )−F ( x1 ) =∫ f ( x ) dx
x1
P ({ ω : x1 ≤ X ( ω ) < x 2 })=F ¿
F¿
F ¿∧not include x ≠ a
%%%
∫ f ( x ) δ ( x ) dx=f ( 0 ) (1)
−∞
What is the value of δ ( x)? It may be called as an impulse function. As you see
∞ a a
So to satisfy (1), the magnitude of δ ( x )=∞ at x=0 , which is not defined at the real number.So in
fact the delta function is not a function. We may remember in the system theory, the Laplace
transform of the delta function, i.e.,
∞
∫ δ ( t ) e−st dt =e−0 =1
0
%%%%
x−a
F ( x )=
b−a
1
f ( x )=
b−a
2) The exponential distribution function
F ( x )=1−e−λx
−λx
f ( x )= λ e
3) The Gaussian probability distribution(the normal distribution function)
2
− ( x−m )
1 2σ
2
f ( x )= e
σ √2 π
x
F ( x )= ∫ f ( y ) dy
−∞
T
A Gaussian random vector x=( x 1 , x 2 , … , x n ) , the density is
f ( x )=
1
n
( 2 π ) |P|
2
1
2
exp [ −1
2
( x−m )T P−1( x−m) ]
where m: the mean vector,
F ( x 1 , x 2 , … , x n ) =P ( X 1 ≤ x 1 , X 2 ≤ x 2 , … , X n ≤ x n )
Marginal Probability
F ( x 1 , x 2 , … , x k ) =P( X 1 ≤ x 1 , X 2 ≤ x 2 , X k ≤ x k , ∞ , … , ∞)
Given a group of people, there are two experiments as measurement of temperature and
blood pulse rate. Let us denote ω as one person
P ( T H ∩ P H }=0.4 , P ( T L ∩ PH ) =0.2
P ( T H ∩ P L }=0.3 , P ( T L ∩ P L )=0.1
P ( T H )=P ( T H ∩ P H ) + P ( T H ∩ P L ) =0.4+0.3=0.7
And
P ( T L ) =P ( T L ∩ P H )+ P ( T H ∩ P L ) =0.2+0.1=0.3
{
f ( x , y )= 2, 0< x , 0< y , x + y <1
0 otherwise
1) Is it a PDF ( or CDF) ?
0.5
dx
dy
0
-0.5
-0.5 0 0.5 1 1.5
∞ ∞ 1
∫ ∫ f (x , y )dxdy =∫ ¿ ¿
−∞ −∞ 0
∞ 1−x
f ( x )= ∫ f (x , y ) dy= ∫ 2 dy=2 ( 1−x ) , 0< x< 1
−∞ 0
3) Is ( X , Y ) independent? No since
2=f ( x , y ) ≠ f ( x ) f ( y )=4 (1−x)(1− y)
HA_2_1 :
{
f ( x , y )= 2 x + y=1 , 0< x <2 , 0< y <1
0 otherwise
%%%
f(x,y)
1
1) Is this a PDF? 2 x
%%%
Def 2.16. Two random variables X and Y are called independent if any event of the
form X ( ω ) ∈ A id independent of any event of the form Y ( ω ) ∈ A where A , B are
n
sets in R
Fact
P ( X ∈ A , Y ∈ B )=P ( X ∈ A ) P (Y ∈ B )
The joint probability distribution
F ( x , y ) =P ( X ≤ x ,Y ≤ y )=P ( X ≤ x ) P ( Y ≤ y )=F ( x ) F ( y )
The joint probability density function
∂2 ∂ ∂ ∂F ∂F
( )
f XY x , y = F ¿ X = x, Y = y¿ F ¿ X = x F ¿Y = y ¿ ¿X = x ¿
∂ x∂ y ∂ x ∂ y ∂ x ∂ y Y=y
¿ f X ( x) f Y ( y )
X ( ω ) f Y ( y )=f X ( g ( y ) ) ∨J ( y )∨¿
−1
Where ¿ J ( y )∨¿ stands for the absolute value of the determinant of the matrix
[ ]
∂ g−1
1 ∂ g−1
n
⋯
∂ y1 ∂ y1
J ( y )= ⋮ ⋱ ⋮
−1 −1
∂ g1 ∂ gn
⋯
∂ yn ∂ yn Y=y
Def.
The mean
∞
E [ X ] = ∫ xf ( x ) dx (a)
−∞
E ( X k )=m ∀ k ,
( )
n n
1 1 1
E ( mk ) =E ∑
n k =1
X k = ∑ E ( X k )= ( nm ) =m(b)
n k=1 n
%%%Kim’s Comment
What is the difference between a) and b)? In order to use (a) , it is needed
know the probability density function, whereas in (b), not needed.
%%%
Examp. 2.19. X is uniformly distributed from 0 to 1,i.e.,
{
f ( x )= 1 ,0 ≤ x ≤ 1
0 , otherwise
Then
∞ 1
1
E ( X ) =∫ xf ( x ) dx=∫ xdx=
−∞ 0 2
Examp. 2.22 The expectation of the value of one roll of one die?
Properties
−∞
4) The variance
σ X = √ var (X )
2
This is a random variable. And the unbiased estimator of σ X
What is the estimator? Let X be a RV. I want to find a constant “C” as RV in some sense.
We may call C as an estimator of the RV X . So there may be many estimators as you like.
3) The mean of X is the minimum variance estimator / the least square error estimator.
Proof:
d
( E ( X−a )2 )= d ( E X 2 +a 2−2 aE ( X ) ) =2 a−2 E ( X )
da da
a=E( X) , which minimizes the (c).
0 3
The Variance is
1 1 1
var ( X )=E ( X ) −E ( X ) = − =
2 2
3 4 12
2.7 Characteristic Functions -skip
Lemma 2.27
1 n
n
d ϕX (υ )
j
E [ X ]=
n
|ν=0
d υn
Prop.2.28 If X is a Gaussian random vector with mean, m, and covariance matrix P,
then its characteristic function is
( T 1 T
ϕ X ( υ )=exp j υ m− υ Pυ .
2 )
%%% Kim’s comment : correlation
%%%
Theorem 2.30
A R.V X N (m x , P X ) , another R.V. V N ( 0 , PV ) and they are independent. Find mean and
covariance of Y =CX +V
%%% Kim’s comment : Characteristic function is difficult to remember. In the text book, using
the characteristic method. In this case we may apply basic theory.
¿ E¿
¿ E¿
¿ E¿
Hence
PY =E [ ( Y −m y ) ( Y −m y )T ]=E ¿
¿ C P X C T + PV
- In general, independency implies the uncorrelated, not vice versa
Sometimes, but most case in this course, we may deal with a random vector whose
components are random variables , i.e.
[ ]
cov ( x , x ) co v( x , y) cov (x , z)
Cov ( X )= cov ( x , y) cov ( y , y ) cov ( y , z)
cov (x , z) cov ( y , z ) cov (z , z )
where
cov ( x , y )=E ¿
hence by definition
E¿
Therefore the matrix Cov ( X ) is a symmetric matrix, i.e.,
T
Cov ( X )=[ Cov ( X ) ]
The diagonal terms of the covariance matrix are variance of each random variable
%%%
[ ]
σx 0 0
PX= 0 σ y 0
0 0 σz
For any semi-positive symmetric matrix M , there is a similar transform matrix S such
that
diag ( Λ X )=S P X ST
Any Gaussian Random vectors, we can find a transformed Random Vectors which is
uncorrelated (independent).
%%%
Theorem 2.31. Let X 1 , … , X n be i.i.d. random variables with finite mean and variance,
n
E [ X k ] =m< ∞ , E [ ( X k −m) ] =σ < ∞, and denote their sum as Y n ≔∑ X k. Then the
2 2
k=1
distribution of the normalized sum
Y n−E [ Y n ] Y n−nm
Zn≔ =
√ var ( Y n ) σ √n
- Remarks:
[ ]
1) See, the condition, E [ X k ] =m< ∞ , E ( X k −m ) =σ < ∞, that means
2 2
the mean and the variance is constant, but the experiment is many time
processing. For example,
a) A die, which is fair or not, you roll the same die many times. Then
Yn 1 n
the mean of the sum ( = ∑ X k) is a Gaussian if n−→ ∞.
n n k=1
2) Some RV has no mean, then it will not be applicable.
f (x , y)
f ( x| y )=
f ( y)
Remarks
E [ X ] =E [ E [ X|Y ] ]
%%% Kim’s comment
E [ X ] =E X [ X ] −→ need f X ( x)
I should say, this formula cannot emphasize too much! This very simple fact use diverse
applications, big data, machine learning, and dynamic system analysis. We should remember this.
%%%
Lemma 2.34.
A stochastic process (or random process) is a time varying random variable, i.e., for any fixed t ,
the process is a random variable.
%%%
Ex. 2.37
X ( ω , t )= A ( ω ) sin t , A ( ω ) ∈ U [−1,1 ]
Def. 2.38.
A 1 = { ω : X ( ω , t ) ∈ K ∀ t ∈ T } , A 1= { ω : X ( ω , t ) ∈ K ∀ t ∈ S } ,
Def. 2.42. Let X be a random process defined on the time interval, T. Let
Def. 2.43 We say that a random process, X, is a Gaussian process if for every
finite collection, X t 1 , X t2 , … , X tn ,the corresponding density function,
f (x 1 , x 2 , … , x n)
Def. 2.44 We say that a random process X is a Gaussian process if every finite
linear combination of the form
N
Y =∑ α j X (t j )
j=1
or, equivalently
F ( X |X
tn t1 ,… , X t n−1 )(
x n|x 1 ,… , x n−1 )=F ( X |X ) ( x n|x n−1) .
tn t n−1
1) Dynamics
x k+1=Φk x k + wk ( 2.36)
2) Given Conditions
a) Noise
E [ w k ]=w k
T
E {( wk −w k ) ( wl−wl ) =W k δ kl
where
δ kl = {10,,k=l
k≠l
b) The states
E [ x 0 ] =x 0
c) The correlation
which implies
E [ ( x k − xk ) ( w j−w j )T ]=0 ∀ j ≥ k
The mean
The covariance
P K+1 =Φk Pk Φ Tk +W k
2.11 Non-linear Stochastic Difference Equations skip