Section 3
Review of probability theory
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 23
Review of probability theory I
Some concepts we all know but let us revise once again
A random experiment is any procedure that can, at least in theory,
be infinitely repeated and has a well-defined set of outcomes
Outcome cannot be predicted with certainty, before the experiment is
run
A random variable is one that takes on numerical values and has an
outcome that is determined by an experiment
This is a real valued (there could be other possible types of measurable
spaces) function defined over sample space of an experiment
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 24
Probability space
Probability space: ( , F, P)
Measurable space (or Borel Space):
I ( , F): and set of subsets of , denoted as F that includes null set
and it is closed under complement, closed under countable unions and
countable intersections. The pair ( , F) is called a measurable space
(consists of a set and ‡ ≠ algebra). could be a real number space.
The probability measure P : F æ [0, 1] - a function on F such that:
I P is countably additive (also called ‡-additive): if {Ai }Œ
i=1 ™ F is a
countable
tŒ collection
qŒ of pairwise disjoint sets, then
P( i=1 Ai ) = i=1 P(Ai )
I The measure of entire sample space is equal to one: P( ) = 1 and
P(ÿ) = 0
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 25
Formal definition of random variable
Random variable: X : æ R is a r.v. if {Ê : X (Ê) Æ r } œ F ’r œ R
We follow the following convention: capital letter (e.g. X ) denotes a
random variable, whereas small letter (e.g. x ) denotes a particular
outcome of the random variable X .
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 26
Types of random variables I
A random variable that can only take on the values zero and one is
called a Bernoulli (or binary) random variable.
A discrete random variable is one that takes on only a finite or
countably infinite (one-to-one correspondence with the positive
integers) number of values.
A Bernoulli random variable takes only two possible values - 0 and 1
- is an example of discrete random variable.
Probability Mass Function (pmf) of X summarises the information
concerning the possible outcomes of X and the corresponding
probabilities: f (xj ) = pj , j = 1, 2, ..., k. (It is sometimes useful to
subscript pdf by the r.v. For example, pdf for X is denoted by fX .)
A variable X is a continuous random variable if it takes on any real
value with zero probability.
While measurements are always discrete in practice, random variables
that take on numerous values are best treated as continuous.
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 27
Types of random variables II
Probability Density Function (pdf) for continuous X .
When computing probabilities for continuous random variables, it is
easiest to work with the cumulative distribution function (cdf):
F (x ) © P(X Æ x ).
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 28
Joint Distributions, Conditional Distributions, and
Independence
Joint probability density functions of two discrete r.v.
fX ,Y (x , y ) = P(X = x , Y = y ).
X and Y are said to be independent iff
fX ,Y (x , y ) = fX (x )fY (y )
for all x and y .
pdfs fX and fY are often called marginal probability density functions
to distinguish them from the joint pdf fX ,Y .
The concept of joint probability and independence can be extended for
more than two r.v.
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 29
Exercise I
An airline has 100 seats for a particular flight. Can we decide on the
optimal (or best) number of reservations the airline should make?
What are the information you need to get?
Simulate your strategy in R.
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 30
R simulation I
We plot the probability of overbooking and expected profit function
Assumptions:
I ◊ = 0.85 (probability of a customer is showing up),
I Net profit per passenger travelled = 10,
I Cost per overbooked passenger = 8 (compensation to pay)
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 31
R simulation II
1.0
0.8
0.6 Probability of overbooking
Probability
0.4
0.2
0.0
100 120 140 160 180 200
Reservations made
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 32
Expected profit function
1000
800 Expected profit function
Expected profit
600
400
200
0
0 50 100 118 150 200
Reservations made
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 33
Notion of independence
Independence plays an important role in obtaining some of the classic
distributions
Example: the number of successes in a sequence of independent
Bernoulli trials
Independence is often a reasonable approximation of a more
complicated situation
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 34
Conditional distribution
In econometrics, we are usually interested in how one random variable,
call it Y , is related to one or more other variables.
How X affects Y is contained in the conditional distribution of Y given
X
This information is summarized by the conditional probability density
function, defined by
fY |X (y |x ) = fX ,Y (x , y )/fX (x )
for all values of x such that fX (x ) > 0.
Interpretation for discrete case: P(Y = y |X = x ) “The probability of
Y = y given that X = x ”
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 35
Bayes’ law
What is the probability of an event, based on prior knowledge of a related
event?
P(B|A): posterior probability of B given A
P(A) and P(B): prior probability and marginal probability
Then
P(B|A)P(A)
P(A|B) =
P(B)
Note that P(B) = P(B|A)P(A) + P(B|¬A)P(¬A)
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 36
Mammography Problem I
Can doctors do a proper Bayesian inference?
Mammography problem:
The probability of breast cancer is 1% for a woman at age forty
who participates in routine screening. If a woman has breast cancer,
the probability is 80% that she will get a positive mammography.
If a woman does not have breast cancer, the probability is 9.6%
that she will also get a positive mammography. A woman in this
age group had a positive mammography in a routine screening.
What is the probability that she actually has breast cancer?
(Source: Gigerenzer, G., & Hoffrage, U. (1995). How to improve Bayesian reasoning
without instruction: Frequency formats. Psychological Review, 102(4))
Eddy (1982): 95 out of 100 physicians estimated the posterior probability
P(cancer |positive) to be between 70% and 80%.
Many physicians, college students, and staff at Harvard Medical School could not
diagnose properly using Bayesian inference.
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 37
Mammography Problem II
The answer is 7.8%.
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 38
Features of probability distribution I
A Measure of Central Tendency: The Expected Value
⁄ Œ
E (X ) = xf (x )dx .
≠Œ
For discrete case,
k
ÿ
E (X ) = xj f (xj ).
j=1
Expected value of X can be a number that is not even a possible value
of X
Expected value of a function of random variable g(X ) is also defined in
the same way ⁄ Œ
E [g(X )] = g(x )fX (x )dx .
≠Œ
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 39
Features of Expectation
For any constant c, E (c) = c
For any constants a and b, E (aX + b) = aE (X ) + b
If {a1 , a2 , . . . , an } are constants and {X1 , X2 , . . . , Xn } are random
variables, then
E (a1 X1 + a2 X2 + · · · + an Xn ) = a1 E (X1 ) + a2 E (X2 ) + · · · + an E (Xn )
We cannot extend this property to non-linear functions
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 40
Other measures of central tendency
Another Measure of Central Tendency: The Median
The definition is too complicated for our purpose: Intuitively, the
median value of X divides the area under the pdf into two equal parts
(for continuous case) or divides the possible discrete values in order
into two equal parts (for discrete case)
If the random variable is symmetric about the mean, then the median
and the mean are the same.
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 41
Measures of variability
Measures of Variability: Variance and Standard Deviation
Ë È
Var (X ) © E (X ≠ µ)2
The standard deviation of a r.v. X , denoted
by sd(X ) is the positive
square root of the variance: sd(X ) = + Var (X )
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 42
Variance plot
pdf
fY
fX
µ X, Y
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 43
Properties of variance
Var (X ) = 0 if and only if there is a constant such that P(X = c) = 1,
in which case E (X ) = c
For any constants a and b, Var (aX + b) = a2 Var (X )
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 44
Standardising a r.v. I
Given a r.v. X , we can subtract its mean(µ) and divide by sd (‡) to
define a new r.v.
X ≠µ
Z=
‡
such that E (Z ) = 0 and Var (Z ) = 1
We can use standardise version of a r.v. to define other features of a
distribution
These features are described by higher order moments
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 45
Skewness and Kurtosis
Skewness: E (Z 3 ) = E [(X ≠ µ)3 ]/‡ 3 (Fisher-Pearson coefficient of
skewness)
This is zero if symmetric around mean
Negative skew (left-skewed): the left tail is longer, Positive skew
(right-skewed): the right tail is longer
Kurtosis: E (Z 4 ) = E [(X ≠ µ)4 ]/‡ 4 . This is always positive.
Larger value means tails are thicker
We compare kurtosis with a reference value of 3 of a Normal
distribution (excess kurtosis).
Mesokurtic: 0 excess kurtosis, Leptokurtic: positive excess kurtosis,
Platykurtic: negatve excess kurtosis
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 46
Features of Joint and Conditional Distributions
Measures of Association: Covariance and Correlation
Cov (x , y ) = E [(X ≠ µX )(Y ≠ µY )]
= E (XY ) ≠ µX µY
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 47
Properties of covariance
If X and Y and independent then Cov (X , Y ) = 0 (converse not true)
For any constants a1 , b1 , a2 and b2
Cov (a1 X + b1 , a2 Y + b2 ) = a1 a2 Cov (X , Y )
Cauchy-Schwartz Inequality:
|Cov (X , Y )| Æ sd(X )sd(Y )
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 48
Correlation
Correlation Coefficient:
Cov (X , Y ) ‡XY
flXY = =
sd(X ).sd(Y ) ‡X ‡Y
Both Cov () and Corr () are measure of linear dependence
Correlation is bounded by -1 and +1: ≠1 Æ flXY Æ 1
For any constants a1 , b1 , a2 and b2
Corr (a1 X + b1 , a2 Y + b2 ) = Corr (X , Y ) if a1 a2 > 0
Corr (a1 X + b1 , a2 Y + b2 ) = ≠Corr (X , Y ) if a1 a2 < 0
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 49
Variance of sum of random variables
For any constants a and b
Var (aY + bY ) = a2 Var (X ) + b 2 Var (Y ) + 2abCov (X , Y )
We can extend this for more than two variables
n
1ÿ 2 n
ÿ n ÿ
ÿ
Var ai Xi = ai2 Var (Xi ) + 2 ai aj Cov (Xi , Xj )
i=1 i=1 j=1 i>j
If Xi are pairwise uncorrelated, then this is simply the sum of variances
(with square of the coefficients)
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 50
Conditional Expectation I
Often in the social sciences, we would like to explain one variable,
called Y , in terms of another variable, say, X .
If Y is related to X in a nonlinear fashion, we would like to know this.
We can summarize the relationship between Y and X by looking at the
conditional expectation of Y given X , sometimes called the conditional
mean
E (Y |X = x ), in shor, E (Y |x )
When y is continuous,
⁄ Œ
E (Y | x ) = yfY |X (y | x )dy .
≠Œ
E (Y | x ) is some function of x , which tells us how expected values of
Y varies with x
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 51
Conditional Expectation II
For example,
E (WAGE | EDUC ) = 1.05 + 0.45 EDUC
Conditional expectation can be a non-linear function as well
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 52
Properties of Conditional Expectation I
E [c(X )|X ] = c(X ), for any function c(X )
For functions a(X ) and b(X ),
E [a(X )Y + b(X )|X ] = a(X )E (Y |X ) + b(X )
If X and Y are independent, then E (Y |X ) = E (Y )
Law of iterated expectations: E [E (Y |X )] = E (Y )
A more general case: E (Y |X ) = E [E (Y |X , Z )|X ]
If E (Y |X ) = E (Y ), then Cov (X , Y ) = 0. In fact, every function of X
is uncorrelated with Y . (Converse is NOT true)
I If X and Y are correlated, then E (Y |X ) must depend on X .
I The conditional expectation captures the nonlinear relationship between
X and Y whereas Correlation captures linear association. (remember the
example of Y = X 2 )
Quick exercise: If U and X are random variables such that E(U|X) =
0, then argue that E(U) =0, and U and X are uncorrelated.
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 53
Conditional Variance
Given random variables X and Y , the variance of Y , conditional on
X = x , is simply the variance associated with the conditional
distribution of Y , given X = x
Var (Y |X = x ) = E (Y 2 |x ) ≠ [E (Y |x )]2
If X and Y are independent, then Var (Y | X ) = Var (Y )
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 54
Some well known distributions
Normal distribution
Standard Normal distribution
Chi Square distribution
t distribution
F distribution
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 55
Normal distribution
The pdf of a normal variable X ≥ Normal(µ, ‡ 2 ) is
C D
1 1 1 x ≠ µ 22
f (x ) = Ô exp ≠
‡ 2fi 2 ‡
where µ = E (X ) and ‡ 2 = Var (X )
Standard normal variable Z ≥ Normal(0, 1). The pdf is
1
„(z) = Ô exp(≠z 2 /2)
2fi
Cumulative distribution: (z) = P(Z Æ z)
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 56
Standard normal properties
Symmetric
P(Z > z) = 1 ≠ (z)
P(Z < ≠z) = P(Z > z)
P(a Æ Z Æ b) = (b) ≠ (a)
P(|Z | > c) = 2[1 ≠ (c)]
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 57
Properties of Normal
If X ≥ N(µ, ‡ 2 ) then aX + b ≥ N(aµ + b, a2 ‡ 2 )
If X and Y jointly normally distributed, then they are independent if
and only if Cov (X , Y ) = 0
Any linear combination of independent, identically distributed normal
random variables has a normal distribution
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 58
Chi square
Let Zi , i = 1, 2, . . . , n be independent random variables, each
distributed as standard normal. Define a new random variable
n
ÿ
X= (Zi )2
i=1
X has what is known as a chi-square distribution with n degrees of
freedom
X ≥ ‰2n
E (X ) = n and Var (X ) = 2n
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 59
Plot of chi-square
Chi−Square Distribution
0.5
df=2
df=4
df=8
0.4
0.3
Density
0.2
0.1
0.0
0 5 10 15 20 25
x
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 60
t distribution
The t distribution is the workhorse in classical statistics and multiple
regression analysis
We obtain a t distribution from a standard normal and a chi-square
random variable
Z
T =
X /n
where Z ≥ N(0, 1) and X ≥ ‰2n and they are independent.
We say T ≥ tn
Degrees of freedom from the chi-square random variable in the
denominator
pdf of the t distribution has a shape similar to that of the standard
normal distribution except that it is more spread out
As the degrees of freedom gets large, the t distribution approaches the
standard normal distribution.
E (T ) = 0 for n > 1 and Var (T ) = n/(n ≠ 2) for n > 2
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 61
Plot of t distribution
t Distribution
0.4
df=24
df=2
df=1
0.3
Density
0.2
0.1
0.0
−4 −2 0 2 4
x
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 62
F distribution
Another important distribution for statistics and econometrics
(hypothesis testing in the context of multiple linear regression model,
ANOVA)
Let X1 ≥ ‰2k1 and X2 ≥ ‰2k2 be two independent random variables.
Then the random variable
X1 /k1
F =
X2 /k2
has a distribution known as F distribution with (k1 , k2 ) degrees of
freedom.
We denote F ≥ Fk1 ,k2
The order of degrees of freedom is important
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 63
Plot of F distribution
F Distribution
1.0
df=2,8
df=6,8
df=6,20
0.8
0.6
Density
0.4
0.2
0.0
0 1 2 3 4
x
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 64