ECON0019: Quantitative Economics and
Econometrics
Lecture 7: Specification + Qualitative Data
Professor Dennis Kristensen
UCL
Autumn 2018
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 1 / 43
Today’s lecture
More on speci…cation of linear regressions. How to use:
transformations (e.g. logarithmic)
interactions
to estimate (more) ‡exible models
Model selection
The adjusted R-squared
The trade-o¤s when adding explanatory variables to the model
Dummy variables
(Testing for) structural breaks
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 2 / 43
Functional form
Allowing for nonlinearities in x and y
MLR allows us to estimate, given data of y and m 1 regressors,
k
g (y ) = β0 + ∑ βj fj (x1 , ..., xm ) + u,
j =1
for known transformations g (y ) and fj (x1 , ..., xm ), j = 1, ..., k. We can:
transform our variables (outcome and explanatory), e.g.,
g (y ) = log (y ), fj (x1 , ..., xm ) = log(xj )
add higher order terms of explanatory variables, e.g.,
fj (x1 , ..., xm ) = xj2
Have interactions of explanatory variables, e.g., fj (x1 , ..., xm ) = xj xk .
But which transformations should we use (if any), and how do we
interpret the resulting model?
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 3 / 43
General approach to modeling nonlinearities
1 Identify a possible nonlinear relationship: Economic theory, similar
existing applications (litt search)
2 Specify the nonlinear relationship (some examples on next slides)
3 Determine whether nonlinear speci…cation improves over linear one:
Compare (test) nonlinear model with linear one
4 Eyeballing: Compare …tted model with actual data (scatter plot +
predicted values), look at residuals (ideally, they should be close to
normally distributed and homoskedastic)
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 4 / 43
Logarithmic transformations
Logarithmic transformations are often useful when working with
economic data:
many variables have lots of values close to 0
but also a few very large values (i.e., outliers), and OLS is very
sensitive to these "large" values
Taking the logarithm narrows the range of the variable ) reduces the
impact of outliers
After taking logatrithms, distribution of variable may look more
Normal (and so OLS better behaved/closer to normally dist)
Often log-transformed: amounts of money (wages, expenditures, sales,
market value), size (city, …rm)
but no clear rules ) use common (economic) sense (and look at
data/test)
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 5 / 43
Interpretation and prediction with log-transforms: Three
cases
(i) y = β0 + β1 log x + u, (ii) log y = β0 + β1 x + u,
(iii) log y = β0 + β1 log x + u,
The interpretation of β1 is di¤erent in each case. For example:
Linear-Log model (i):
∂E [y jx ] ∂ [ β0 + β1 log x ] ∂ log x β
= = β1 = 1,
∂x ∂x ∂x x
Consider the e¤ect of a 1% change in
x ) ∆x /x = 1/100 ) ∆E [y jx ] β1 ∆x /x = β1 /100.
But this is an approximation since derivative measures impact of
in…nitesimal change. In reality,
β ∆x
∆E [y jx ] = β1 flog (x + ∆x ) log (x )g 6= 1 .
x
So do not use β1 ∆x /x when ∆x "large".
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 6 / 43
Prediction with log-response
When y is log-transformed, prediction has to be done carefully since
E [y jx ] 6= exp (E [log y jx ]).
Here, for any function f (x ),
log y = β0 + β1 f (x ) + u , y = exp ( β0 + β1 f (x )) exp (u ) ,
and so
E [y jx ] = exp ( β0 + β1 f (x )) E [exp (u ) jx ] ,
where, according to Jensen’s Inequality1 ,
E [exp (u ) jx ] > exp (E [u jx ]) = exp (0) = 1.
Special (but unrealistic) case: E [exp (u ) jx ] = E [exp (u )] = α0 in
which case
E [y jx ] = α0 exp ( β0 + β1 f (x )) .
We can then estimate α0 by
1 n
n i∑
α̂0 = exp (ûi ) .
=1
1 https://en.wikipedia.org/wiki/Jensen
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 7 / 43
Polynomial regressions (simple case)
Generally, if we worry about non-linearities (after potential
log-transforms), these can be captured by adding polynomial terms,
y = β0 + β1 x + β2 x 2 + βk x k + u.
In most applications, adding quadratic (k = 2) and/or cubic (k = 3)
terms su¢ ce.
As with log-transforms, the non-linear components make
interpretation of coe¢ cients di¢ cult. E.g, with k = 2,
∆E [y jx ] ∂E [y jx ]
= β1 + 2β2 x and so ∆E [y jx ] ( β1 + 2β2 x ) ∆x
∆x ∂x
but this is only a good approx when ∆x "small".
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 8 / 43
Example: Polynomial wage equation
log(pay ) = β0 + β1 educ + β2 exp + β3 exp 2 + u
The partial e¤ect of exp(erience) on log(pay ) is
∆E [log(pay )jeduc, exp ] ∂E [log(pay )jeduc, exp ]
= β2 + 2β3 exp.
∆exp ∂exp
Thus,
\
∆log (pay )
β̂2 + 2 β̂3 exp,
∆exp
so the estimated e¤ect varies with the level of exp
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 9 / 43
Example: Polynomial wage equation
Di¤erent lines represent predicted ln(wage) at di¤erent levels of education
log(pay ) = β0 + β1 educ + β2 exp + β3 exp 2 + u
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 10 / 43
Interactions
Quadratic term, xj2 , is a special case of so-called interaction terms,
where you incl, e.g., xj1 xj2 .
Suppose we want to allow for the possibility that higher education
individuals have steeper experience pro…les. This can be captured by
log(pay ) = β0 + β1 educ + β2 exp + β3 exp educ + u.
Now the partial e¤ect of exp on log(pay ) is
∆E [log(pay )jeduc, exp ]
= β2 + β3 educ
∆exp
We can test for interaction e¤ects H0 : β3 = 0 against the two-sided
alternative
To allow for non-linear interactions, we can extend the model:
log(pay ) = β0 + β1 educ + β2 exp + β2 exp 2
+ β4 exp educ + β5 exp 2 educ + u.
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 11 / 43
Example: Returns to experience, at di¤erent education
levels
interaction between years_of_school and experience
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 12 / 43
Example: Nonlinear returns to experience, at di¤erent
education levels
Interaction between education and quadratic experience pro…le
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 13 / 43
Goodness-of-…t/Model selection
So which model should we use? Usually we think of a good model as
one which gives us unbiased (and precisely estimated) e¤ects.
One particular bias we are concerned with is omitted variable bias –
to avoid this, it is tempting too pursue a
kitchen sink approach: Throw in all variables at your disposal.
But that’s a bad idea! We know over-all variance of OLS will
generally increase if irrelevant regressors are included (see expression
of variance of OLS in MLR). Variance ") Less precise inference )
Less strong conclusions can be drawn.
How do we balance the risks of bias and imprecise inference? Can we
come up with a good model selection rule?
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 14 / 43
Some goodness-of-…t measures
One measure for the quality of a given regression is R 2 : But R 2
increases mechanically as we add more regressors ) R 2 is not a very
useful guide to evaluate models.
Alternative measure is the F statistic associated with a going from
large to small model. This uses the relative change in R-squared, is
generally more useful. But not always adequate:
Pool of candidate regressors may be large and/or functional form
speci…cation not evident
Models are non-nested
Sample size may be "small"
Multiple testing problem (size control when carrying out multiple
F -tests di¢ cult)
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 15 / 43
Adjusted R-squared
Can we adjust the standard R 2 so that it is better suited for model
selection? Recall the de…nition of R 2 :
SSR SSR/n
R2 = 1 =1
SST SST /n
The adjusted R 2 , R̄ 2 , "penalizes" big models,
2 SSR/(n k 1)
R =1
SST /(n 1)
where k = the number of covariates. As k ": SSR # but at the same
time 1/(n k 1) ".
2
Bottom line: R can go up or down when adding an explanatory
variable to the model. It provides a more fair comparison of di¤erent
speci…cations/models.
Alternative selection rules exist: Akaike’s Information Criterion (AIC),
Schwarz’s BIC, etc. – not covered here.
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 16 / 43
Adding more regressors may change interpretation
However, in some cases, adding regressors will change interpretation
of regression coe¢ cients
In particular, be careful when including other outcomes related to y
in a regression
Example: suppose we wish to estimate the e¤ect of parents in on
child’s income. Should we control for child’s education?
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 17 / 43
Intergenerational wage elasticity – with education as
control
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 18 / 43
Simultaneity a¤ects interpretation
Suppose that the true model is
lnpay = β0 + β1 lnfaminc + β2 edu + u,
edu = δ0 + δ1 lnfaminc + e,
where edu is individual’s years of education. Then lnfaminc a¤ects lnpay
through both channels:
directly, perhaps because higher income can purchase higher quality
schooling when young
indirectly, because it provides more resources for additional years of
school
So edu is (partially) an outcome of higher income
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 19 / 43
Two channels....
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 20 / 43
... a¤ect the overall e¤ect
Assume E [u jlnfaminc ] = E [e jlnfaminc ] = 0 so unobservables (such
as motivation, preference for education) that a¤ect schooling and pay
are uncorrelated with family income.
Then
E [lnpay jlnfaminc ] = β1 lnfaminc + β2 E [edu jlnfaminc ]
= ( β1 + β2 δ1 )lnfaminc
since E [edu jlnfaminc ] = δ1 lnfaminc.
So the FULL causal e¤ect of income on pay is ( β1 + β2 δ1 ), not just
β1 .
OLS would have captured the full e¤ect if we had left out edu...
A more complete treatment of simultaneous equations models will be
given next term
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 21 / 43
Be careful when including outcome variables in the
regression
Including outcome variables in a regression can sometimes be helpful
It can help show the channels by which x a¤ects y
e.g., rougly half of the e¤ect of parents income on children’s income is
through the eduction channel.
The “direct e¤ect” then captures all the channels not captured
through the modeled channel (e.g., nutrition)
But BE CAREFUL: IT IS NOT CORRECT to say that increasing
parent’s income 1% only increases child’s pay by .17%, because
child’s schooling is an omitted variable in the regression of child’s in
parent’s income.
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 22 / 43
Adding more variables: summary
1 When it solves an omitted variables bias problem
2 When we think there is good reason to allow more ‡exibility in how
the x variables a¤ect y
3 Adding more variables
Always increases R 2
Can either increase or reduce R̄ 2
Can either increase or reduce standard errors
BE CAREFUL: Adding other outcomes of the variable of interest in a
regression will change interpretation of coe¢ cient.
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 23 / 43
Continuous versus discrete random variables
Explanatory variables can either be
Continuous: In this case, we can talk about in…nitesimal change.
Discrete/categorical/ordinal (i.e., discrete, not continuous):
In…nitesimal change nonsensical.
Dummy variables: A special type of categorical variables with only 2
possible values (which we can always choose as 0 and 1).
Examples of dummy variables: gender, marital status, work etc.
(
1 if individual i is female
femalei =
0 if individual i is male
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 24 / 43
Dummy Variables as regressors - di¤erent interpretation
We can include dummy variables as regressors just as with any other
variable. For example,
log(pay ) = β0 + β1 lnfaminc + β2 female + u.
But interpretation is di¤erent due to its discrete nature:
∂E [log(pay )jlnfaminc, female ]
β1 = ,
∂lnfaminc
β2 = E [log(pay )jlnfaminc, female = 1] E [log(pay )jlnfaminc, fema
β1 measures the in…nitesimal change and we have
∆ log(pay ) = β2 ∆lnfaminc
β2 is an intercept shift.
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 25 / 43
Dummy variables – How it looks
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 26 / 43
Reference category
We can change the reference category without losing information:
log(pay ) = β0 + β1 educ + β2 female + u
= β0 + β1 educ + β2 (1 male ) + u
= ( β0 + β2 ) + β1 educ β2 male + u
= α0 + α1 educ + α2 male + u
where α0 = ( β0 + β2 ), α1 = β1 , α2 = β2 .
But the interpretation of the coe¢ cients change accordingly
Note that female and male cannot both be included since they are
perfectly collinear
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 27 / 43
Regressors with multiple discrete outcomes
Suppose x 2 f1, ...., J g is a categorical variables taking on J 3
discrete values: E.g., occupation, region, educational degree,...
We could potentially just include x in regression – but this assumes
that the e¤ect of x changing from 1 to 2 is the same as 2 to 3.
More ‡exible model allowing for heterogenous e¤ect across
categories: Create J 1 dummy variables (where we omit one, say
the last, x = J)
(
1 if x = j
dj = , for j = 1, ..., J 1.
0 otherwise
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 28 / 43
Dummy variables for categorical information
We can then estimate
J 1
y = β0 + ∑ βj dj + u
j =1
where the interpretation of the regression coe¢ cients is:
β0 = E [y jx = J ] is here the average outcome of those in the
reference group
βj = E [y jx = j ] E [y jx = J ] is the average outcome relative to the
omitted category
So to compare group k to group l we need to compare their
coe¢ cients:
βj βk = E [y jx = j ] E [y jx = k ]
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 29 / 43
Example: Returns to Eduction
The STATA command gen (Dschool) creates the dummy variables
Dschool1,....,Dschool12. For example:
Dschool1 = 1 ) spent 10 years in school (“left school at age 15”)
Dschool7 = 1 ) spent 16 years in school (“left school at age 21”)
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 30 / 43
Education dummy variables
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 31 / 43
Heterogenous returns to education
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 32 / 43
Discrete outcome variables: The binary case
It is also possible to use OLS with qualitative outcome variables:
Suppose y is binary taking two values, y 2 f0, 1g.
Numerous examples, such as:
College educated (y = 1) vs. non-college educated (y = 0)
Working (y = 1) vs. not working (y = 0)
MLR with qualitative outcome variables is called a Linear Probability
Model
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 33 / 43
Linear Probability (LP) Model
Consider the following MLR for a binary outcome variable y 2 f0, 1g:
y = β0 + β1 x1 + . . . + βk xk + u
Since y is binary,
E [y jx1 , x2 , ..., xk ] = Pr(y = 1jx1 , x2 , ..., xk )
and so the MLR implies
Pr(y = 1jx1 , x2 , ..., xk ) = β0 + β1 x1 + . . . + βk xk
This means that βj measures how the probability of success (y = 1)
changes when xj changes by amount ∆xj , everything else …xed
∆ Pr(y = 1jx1 , x2 , ..., xk ) = βj ∆xj ) ∆ Pr(y = 1jx1 , x2 , ..., xk )/∆xj =
For example, the increase in the probability of working given a change
in education, holding experience (and other factors) …xed
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 34 / 43
Shortcoming of LP Model
While the LP model is easy to estimate, it has some shortcomings:
We can get predictions that fall outside [0, 1] – inconsistent with the
fact that we are estimating probabilities
it does not work well for values of the x’s far away from the sample
averages – so be cautious in interpreting the estimates for such values
of x.
It is inherently heteroskedastic, since
Var(y jx ) = Pr(y = 1jx1 , x2 , ..., xk )(1 Pr(y = 1jx1 , x2 , ..., xk ))
The formula above is just a generalization of the unconditional
variance of binary random variable, Var(y ) = p (1 p )
Note that if Pr(y = 1jx1 , x2 , ..., xk ) = 0 or 1, Var(y jx1 , x2 , ..., xk ) = 0
Note that if Pr(y = 1jx1 , x2 , ..., xk ) = 0.5, Var(y jx1 , x2 , ..., xk ) = 0.2.
the conditional variance of y varies with x – so there is
heteroskedasticity!
Thus, our current tools for inference do not apply here!
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 35 / 43
Example of LP Model: Labour force partipation (inlf = 1)
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 36 / 43
Inconsistent predictions with LP model
The command "predict" creates ŷ = β̂0 + β̂1 x1 + ... + β̂k xk
8 of these predictions fall outside the unit interval...
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 37 / 43
Interaction models with dummy variables
We have seen how to i) use dummy variables, and ii) interactions
Interactions with dummy variables are very useful to allow for
heterogeneous intercept and/or slopes (often called “structural
breaks”)
As an example take the wage equation
log(pay ) = β0 + β1 years_of _school + β2 high_exp
+ β3 years_of _school high_exp + u
where high_exp = 1 if experience >20 years
We allow
the level of the wage equation (i.e. intercept) to be di¤erent between
high and low experienced
di¤erential return to education (i.e. slope) between high and low
experienced
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 38 / 43
Interpretation of model
Our fully interacted model,
log(pay ) = β0 + β1 years_of _school + δ0 high_exp
+δ1 years_of _school high_exp + u,
implies two separate wage equations, one for low experience, one for high
experience:
log(pay ) = β0 + β1 years_of _school + u, if high_exp = 0
log(pay ) = α0 + α1 years_of _schooli + u, if high_exp = 1
where α0 = ( β0 + δ0 ), α1 = ( β1 + δ1 ). Thus, much more ‡exible.
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 39 / 43
Structural Breaks
Our dummy-interaction model is a special case of a structural break
model.
De…nition: A structural break or change occurs when (at least some
of) the model parameters di¤er across subsets of the sample
Examples:
In time series, coe¢ cients may di¤er across time periods
In labor economics, coe¢ cients may di¤er by race or gender or
experience
In cross-country analyses, coe¢ cients may di¤er by government type
We can test for structural breaks using a F -test (often called a
"Chow Test" when testing structural breaks)
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 40 / 43
Wage equations by experience level
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 41 / 43
Interpretation of results
Di¤erence in intercepts between high and low experience = 4.894727
- 4.72645=.1682767 .
This is the di¤erence between high and low (potential) experience
people, evaluated at 0 years of education
Di¤erence in slopes= .1019567-.0959294=.0060273
Are these di¤erences signi…cant? Use the "Chow Test"!
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 42 / 43
Chow Test in action
Chow F -statistic = 755.69 with p-value = 0.0000 ) reject
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 43 / 43