Empirical Methods for Finance - Exam Cheat sheet

1. Linear regression model

(a) Univariate linear regression model: y = β0 + β1 x + u

(b) Multivariate linear regression model: y = β0 + β1 x1 + ... + βk xk + u
(c) Zero-conditional mean assumption: E(u|x) = E(u) = 0
(d) The Population Regression Function (PRF): under the zero-conditional mean assumption,

E(y|x) = β0 + β1 x1 + ... + βk xk

This implies y = E(y|x) + u

(e) The Fitted Line or Regression Line: ŷ = β̂0 + β̂1 x1 + . . . + β̂k xk
b = y − yb. Hence, y = yb + u
The regression residuals are defined as u b
(f) Models with logs.

2. Ordinary Least Square (OLS) Estimator

(a) The OLS parameters βb0 , βb1 are chosen to minimise the sum of squared residuals:
min SSR = min b2i = min
u (yi − ybi )2 = min (yi − βb0 − βb1 xi )2
b0 ,β
b1 β
b0 ,β
i=1 β
b0 ,β
i=1 β
b0 ,β

(b) The formula for the OLS estimators in the univariate case is given by:
i=1 (yi − y)(xi − x) sample covariance(x, y)
βb1 = PN = , βb0 = y − βb1 x
i=1 (xi − x)
2 sample variance(x)

(c) Algebraic properties of OLS

i. i=1 u
bi = 0 (The sum (and the sample average) of the OLS residuals is zero)
ii. i=1 xi u
bi = 0 (The sample covariance between the regressor(s) and the OLS residuals is zero)
iii. The point (x, y) always lies on the regression line
(d) Goodness of Fit
i. SST = i=1 (yi − y)2 , SSE = i=1 (ŷi − ŷ)2 , SSR = i=1 (ûi − û)2 = i=1 û2i
ii. SST = SSE + SSR (sum of squares Total, Explained, Residual)
iii. R-squared. R2 = SSE/SST = 1 − SSR/SST

3. Linear regression assumptions

ˆ MLR.1 (Linear in Parameters) The population model can be written as y = β0 + β1 x1 + ... + βk xk + u
ˆ MLR.2 (Random Sampling) We have a random sample of size N, {(xi1 , ..., xik , yi ) : i, . . . N } following the population
ˆ MLR.3 (No Perfect Collinearity) In the sample (and in the population), none of the independent variables is
constant, and there are no exact linear relationships among the independent variables.
ˆ MLR.4 (Zero Conditional Mean) In the population, the error term has zero mean given any value of the explanatory
variable: E(u|x) = E(u) = 0
ˆ MLR.5 (Homoskedasticity, or Constant Variance) The error u has the same variance given any value of the
explanatory variable x: V ar(u|x) = σ 2
ˆ MLR.6 (Normality) The population error u is independent of the explanatory variables x1 , ..., xk and is normally
distributed with zero mean and variance σ 2 : u ∼ iid N ormal(0, σ 2 )
4. Statistical properties of OLS
(a) Unbiasedness of OLS. Under MLR.1 through MLR.4 the OLS estimator is unbiased for any sample size n, i.e. E(β)
b =β
(b) Omitted variable bias. Consider the true model is y = β0 + β1 x1 + β2 x2 + u. Let βe1 be the OLS estimator from
estimating the univariate model of y on x1 only. E(βe1 ) = β1 + δe1 β2 = β1 + BIAS.
BIAS = δe1 β2 where δe1 = SampleCov(x2 ,x1 )
SampleV ar(x2 )
PN σ σ2
(c) Variance of the OLS estimator. Under MLR.1-MLR.5, the variance of βb1 is V ar(βb1 ) = 2 = SSTx and
i=1 (xi −x)

therefore the standard deviation is sd(βb1 ) = √ σ SSTx

b2 = (n − 2)−1
(d) Error variance estimator. An unbiased estimator of σ 2 is σ i=1 b2i
(e) Standard error of the OLS estimator. An unbiased estimator of σ 2 is se(βbj ) = √ σb

(f) Distribution of the OLS estimator. Under MLR.1 through MLR.6, conditional on the sample values of the indepen-
dent variables, the OLS is normally distributed: βbj ∼ N ormal(βj , V ar(βbj )). Therefore, (βbj −βj )/sd(βbj ) ∼ N ormal(0, 1).

5. Statistical Inference
(a) t-statistic. t(βbj ) = (βbj − βj )/se(βbj )
(b) Under MLR.1 through MLR.6, the t-statistic follows a t-distribution with (n-k-1) degrees of freedom t(βbj ) = (βbj −
βj )/se(βbj ) ∼ tn−k−1 where n is the number of observations and k is the number of regressors.
(c) The p-value is the probability of observing a value more extreme than the observed test statistics under the null-
(SSRR −SSRU R ) n−k−1
(d) F-test. F = SSRU R × q where q is the number of parameters set to zero under the null (i.e. number of
(e) Confidence intervals. Under MLR.1-MLR.6, the 95% confidence interval for β is β̂ ± c97.5% · se(β̂), where c97.5% is the
97.5th percentile in a tn−k−1 distribution

6. Regression Diagnostics

(a) Functional Form Misspecification tests.

ˆ RESET test (ovtest)1. Estimate y = β0 + β1 x1 + . . . + βk xk + u. 2. Compute predicted values ŷ 3. Estimate
y = β0 + β1 x1 + . . . + βk xk + δ1 ŷ 2 + δ2 ŷ 3 + e 4. Test H0 : δ1 = δ2 = 0
ˆ Davidson-MacKinnon Test for Nonnested alternatives. Use predicted values from model 1 as regressor in model 2,
and viceversa. Look at statistical significance.
(b) Breusch-Pagan test for homoskedasticity (hettest)
(c) Shapiro-Wilk test for normality of residuals (swilk res)

7. Event studies
(a) Abnormal return. AR = R - E[R]
(b) Average abnormal return. ARτ = N i=1 ARi,τ
(c) Cumulative average abnormal return. CAR(τ1 , τ2 ) = τ =τ1 ARτ

8. Fixed-effects and Difference-in-differences

(a) The FE model: yit = β0 + β1 xit + ai + eit
(b) FE estimator from estimating the transformed model: ÿit = β1 ẍit + ëit where z̈it = zit − z i
(c) DiD: yit = β0 + β1 (treati ) + β2 (af tert ) + β3 (treati × af tert ) + uit . βb3 = (y T,A − y T,B ) − (y C,A − y C,B )

