[go: up one dir, main page]

0% found this document useful (0 votes)
0 views17 pages

ST3236_Note3

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 17

LECTURE 3: Random Variables and Expectations

Somabha Mukherjee

National University of Singapore

1 / 17
Outline

1 Random Variables

2 Joint Distribution and Independence

3 Expectation and Variance

2 / 17
What are Random Variables?
The outcomes of a random experiment are often quantified/summarized by
random variables. A random variable assigns values to each outcome of a
random experiment

Mathematically speaking, a random variable is a function

X : Ω 7→ R (set of all real numbers)

satisfying the condition that the sets {X ≤ x} := {ω ∈ Ω : X (ω) ≤ x} are


events for all x ∈ R.

Example: You roll a die twice; your sample space is:

Ω := {(i, j) : 1 ≤ i ≤ 6, 1 ≤ j ≤ 6} .

Now, define X := sum of the two faces that you observe. This is a random
variable, since
X (i, j) = i + j .
3 / 17
Distribution Functions

The distribution function of a random variable X is defined as:

FX (t) = P(X ≤ t) .

For example, if a random variable X takes values 1, 2 and 3 with probabilities


1 1 1
2 , 3 and 6 respectively, then its distribution function is given by:


 0 if t<1
1

if 1≤t<2
FX (t) = 2
5
 if 2≤t<3
6


1 if x ≥ 3.

In general, if a random variable X takes values x1 < x2 < x3 < . . . with


probabilities p1 , p2 , p3 , . . . respectively, then we have:

FX (t) = p1 + p2 + . . . + pk if xk ≤ t < xk+1 .

4 / 17
Properties of Distribution Functions

The distribution function F of any random variable X satisfies the following


four properties:
1 F is non-decreasing, i.e. F (x) ≤ F (y ) whenever x ≤ y .
2 F is right-continuous, i.e. limx→t + F (x) = F (t).
3 F may not be left-continuous, but limx→t − F (x) exists for all t ∈ R, and
equals P(X < t).
4 limx→−∞ F (x) = 0 and limx→∞ F (x) = 1.

Conversely, if a function F : R → R satisfies conditions 1,2 and 4 above, then


it can be expressed as the distribution function of some random variable X .

5 / 17
Discrete Distributions
A random variable X is said to have a discrete distribution, if there exists a
countable set C such that P(X ∈ C ) = 1.

Examples of Discrete Distributions:


Binomial Distribution: P(X = k) = kn p k (1 − p)n−k (0 ≤ k ≤ n)

1

Number of successes in n independent trials of an experiment, with success


probability p.
−λ k
2 Poisson Distribution: P(X = k) = e k!λ (k ∈ {0, 1, 2, . . .})
Limiting form of the Binomial distribution with p = λn .
3 Geometric Distribution: P(X = k) = p(1 − p)k−1 (k ∈ {1, 2, . . .})
Number of independent trials needed to get the first success (with success
probability p).
Negative Binomial Distribution: P(X = k) = k−1
 r k−r
r −1 p (1 − p) (k ≥ r )
4

th
Number of independent trials needed to get the r success (with success
probability p).
6 / 17
Continuous Distributions

A random variable X is said to have a continuous distribution, if its


distribution function F is continuous.

A special class of continuous distributions satistfy the property that there


exists a non-negative function f such that the distribution function can be
written as: Z t
F (t) = f (x) dx .
−∞

In this case, the distribution is said to be absolutely continuous, and the


function f is called the density function of the distribution.

A Word of Caution!
There are continuous distributions that are not absolutely continuous. However,
these examples are a bit hard, and beyond the scope of this module. That is why,
some textbooks simply use the name continuous distributions for absolutely
continuous distributions, which is technically incorrect.

7 / 17
Examples of Absolutely Continuous Distributions

1
1 Uniform Distribution: f (t) = b−a (a < t < b).
Assigns equal probability to sets of equal size.

2 2
2 Normal Distribution: f (t) = (2πσ 2 )−1/2 e −(t−µ) /2σ (t ∈ R)
Used to model the weights and heights of individuals in a population.

3 Exponential Distribution: f (t) = λe −λt (t > 0)


Used to model the lifes of electric bulbs and radioactive chemicals.

8 / 17
Outline

1 Random Variables

2 Joint Distribution and Independence

3 Expectation and Variance

9 / 17
Joint Distribution of Random Variables

The joint distribution function of random variables X1 , X2 , . . . , Xn is defined


as:
FX1 ,...,Xn (t1 , . . . , tn ) = P (X1 ≤ t1 , . . . , Xn ≤ tn ).
Random variables X1 , . . . , Xn are said to be independent, if

FX1 ,...,Xn (t1 , . . . , tn ) = FX1 (t1 ) . . . FXn (tn ) .

That is, “the joint distribution function is a product of the marginal


distribution functions”.
Consequence: For independent random variables X1 , . . . , Xn and subsets
A1 , . . . , An of R,

P(X1 ∈ A1 , . . . , Xn ∈ An ) = P(X1 ∈ A1 ) . . . P(Xn ∈ An ) .

EXERCISE!!

10 / 17
Outline

1 Random Variables

2 Joint Distribution and Independence

3 Expectation and Variance

11 / 17
Expected Value of a Random Variable
The expected value of a random variable is a measure of the (weighted)
average of all the values it can take.

We will mathematically define the expected values of a discrete and an


absolutely continuous random variable.

Let X be a discrete random variable taking values in the countable set C .


Then, X
E(X ) := kP(X = k)
k∈C
provided the above series is well-defined

Let X be an absolutely continuous random variable with density function f .


Then, Z ∞
E(X ) := xf (x) dx
−∞
provided the above integral exists.

Note that E(X ) depends only on the distribution of X and not on the exact
function X .
12 / 17
Some Properties of Expectation

1 If E(c) = c for a non-random constant c.

2 If X ≥ Y , then E(X ) ≥ E(Y ).

3 If X ≥ Y and EX = EY , then X = Y with probability 1.

4 If EX < ∞, then X < ∞ with probability 1.

5 Linearity: For constants c, d and random variables X , Y , we have:

E(aX + bY ) = aE(X ) + bE(Y ) .

6 If g : R 7→ R is any function, then


(P
g (j)P(X = j) if X is discrete
E (g (X )) = R ∞j
−∞
g (t)f (t) dt if X is absolutely continuous with density f .

7 If X and Y are independent random variables, then E(XY ) = E(X ) × E(Y ).

13 / 17
Indicator Functions and Examples
For an event A, we define the indicator of the event A as:
(
1 if ω ∈ A
1A (ω) =
0 if ω ∈
/ A.

E(1A ) = (1 × P(1A = 1)) + (0 × P(1A = 0)) = P(A).

A Funny Example: Two monkeys are independently typing random letters on


two separate typewriters. Suppose that someone stops them when they have
completed typing exactly 100 letters. What is the expected number of places
where their letters match?

For each 1 ≤ i ≤ 100, let Ai denote the event that the letters at the i th place
26 1
match. Then, P(Ai ) = 26 2 = 26 .

Expected number of places where their letters match is given by:


100
! 100
100
1A i = E (1Ai ) =
X X
E .
26
i=1 i=1

14 / 17
Variance and Higher Moments

Variance, or more precisely its square root (known as standard deviation) is a


measure of spread of the random variable around its expectation.

Mathematically speaking, the variance of a random variable X is defined as:


2
Var(X ) := E [(X − E(X ))] .

An alternative expression of the variance is given by:


2
Var(X ) := E(X 2 ) − [E(X )] EXERCISE !! .

Similarly the r th raw moment of X is defined as E(X r ) and the r th central


moment is defined as:
E [(X − E(X ))r ] .
Expectation is the first raw moment and variance is the second central
moment.

15 / 17
Covariance

Covariance is a measure of the relationship between two random variables X


and Y

Mathematically speaking, the covariance between random variables X and Y


is defined as:

Cov(X , Y ) := E [(X − E(X ))(Y − E(Y ))] .

An alternative expression of the variance is given by:

V(X ) := E(XY ) − E(X )E(Y ) EXERCISE !! .

Linearity: Covariance is bilinear (EXERCISE!!)

Cov(aX + bY , cW + dZ ) = ac Cov(X , W ) + ad Cov(X , Z )


+ bc Cov(Y , W ) + bd Cov(Y , Z ) .

Var(X ) = Cov(X , X ).

16 / 17
Some Properties of Variance and Covariance
Var(aX + bY ) = a2 Var(X ) + b 2 Var(Y ) + 2ab Cov(X , Y ).

A Generalization: For n random variables X1 , . . . , Xn ,


n
! n
X X X
Var ai Xi = ai2 Var(Xi ) + 2 ai aj Cov(Xi , Xj ).
i=1 i=1 i<j

For INDEPENDENT random variables X1 , . . . , Xn , we have:

Var(X1 + . . . + Xn ) = Var(X1 ) + . . . + Var(Xn ).

Remark:
Even for random variables X1 , . . . , Xn which are not independent , but
Cov(Xi , Xj ) = 0 for all i ̸= j, we have:

Var(X1 + . . . + Xn ) = Var(X1 ) + . . . + Var(Xn ).

Can you find examples of such random variables for n = 2? EXERCISE!!


17 / 17

You might also like