[go: up one dir, main page]

0% found this document useful (0 votes)
27 views43 pages

ECON0019 Lecture7 Slides

The lecture covers advanced topics in linear regression specification, including transformations, interactions, and model selection techniques such as adjusted R-squared. It emphasizes the importance of careful model specification to avoid omitted variable bias and discusses the implications of including outcome variables in regression analyses. Additionally, it highlights the use of logarithmic and polynomial transformations to capture nonlinear relationships in economic data.

Uploaded by

1838053161
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views43 pages

ECON0019 Lecture7 Slides

The lecture covers advanced topics in linear regression specification, including transformations, interactions, and model selection techniques such as adjusted R-squared. It emphasizes the importance of careful model specification to avoid omitted variable bias and discusses the implications of including outcome variables in regression analyses. Additionally, it highlights the use of logarithmic and polynomial transformations to capture nonlinear relationships in economic data.

Uploaded by

1838053161
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

ECON0019: Quantitative Economics and

Econometrics
Lecture 7: Specification + Qualitative Data

Professor Dennis Kristensen


UCL

Autumn 2018

Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics


Autumn 2018 1 / 43
Today’s lecture

More on speci…cation of linear regressions. How to use:


transformations (e.g. logarithmic)
interactions
to estimate (more) ‡exible models
Model selection
The adjusted R-squared
The trade-o¤s when adding explanatory variables to the model
Dummy variables
(Testing for) structural breaks

Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics


Autumn 2018 2 / 43
Functional form
Allowing for nonlinearities in x and y

MLR allows us to estimate, given data of y and m 1 regressors,


k
g (y ) = β0 + ∑ βj fj (x1 , ..., xm ) + u,
j =1

for known transformations g (y ) and fj (x1 , ..., xm ), j = 1, ..., k. We can:


transform our variables (outcome and explanatory), e.g.,
g (y ) = log (y ), fj (x1 , ..., xm ) = log(xj )
add higher order terms of explanatory variables, e.g.,
fj (x1 , ..., xm ) = xj2
Have interactions of explanatory variables, e.g., fj (x1 , ..., xm ) = xj xk .
But which transformations should we use (if any), and how do we
interpret the resulting model?

Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics


Autumn 2018 3 / 43
General approach to modeling nonlinearities

1 Identify a possible nonlinear relationship: Economic theory, similar


existing applications (litt search)
2 Specify the nonlinear relationship (some examples on next slides)
3 Determine whether nonlinear speci…cation improves over linear one:
Compare (test) nonlinear model with linear one
4 Eyeballing: Compare …tted model with actual data (scatter plot +
predicted values), look at residuals (ideally, they should be close to
normally distributed and homoskedastic)

Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics


Autumn 2018 4 / 43
Logarithmic transformations

Logarithmic transformations are often useful when working with


economic data:
many variables have lots of values close to 0
but also a few very large values (i.e., outliers), and OLS is very
sensitive to these "large" values
Taking the logarithm narrows the range of the variable ) reduces the
impact of outliers
After taking logatrithms, distribution of variable may look more
Normal (and so OLS better behaved/closer to normally dist)
Often log-transformed: amounts of money (wages, expenditures, sales,
market value), size (city, …rm)
but no clear rules ) use common (economic) sense (and look at
data/test)

Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics


Autumn 2018 5 / 43
Interpretation and prediction with log-transforms: Three
cases

(i) y = β0 + β1 log x + u, (ii) log y = β0 + β1 x + u,


(iii) log y = β0 + β1 log x + u,
The interpretation of β1 is di¤erent in each case. For example:
Linear-Log model (i):
∂E [y jx ] ∂ [ β0 + β1 log x ] ∂ log x β
= = β1 = 1,
∂x ∂x ∂x x
Consider the e¤ect of a 1% change in
x ) ∆x /x = 1/100 ) ∆E [y jx ] β1 ∆x /x = β1 /100.
But this is an approximation since derivative measures impact of
in…nitesimal change. In reality,
β ∆x
∆E [y jx ] = β1 flog (x + ∆x ) log (x )g 6= 1 .
x
So do not use β1 ∆x /x when ∆x "large".
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 6 / 43
Prediction with log-response
When y is log-transformed, prediction has to be done carefully since
E [y jx ] 6= exp (E [log y jx ]).
Here, for any function f (x ),
log y = β0 + β1 f (x ) + u , y = exp ( β0 + β1 f (x )) exp (u ) ,
and so
E [y jx ] = exp ( β0 + β1 f (x )) E [exp (u ) jx ] ,
where, according to Jensen’s Inequality1 ,
E [exp (u ) jx ] > exp (E [u jx ]) = exp (0) = 1.
Special (but unrealistic) case: E [exp (u ) jx ] = E [exp (u )] = α0 in
which case
E [y jx ] = α0 exp ( β0 + β1 f (x )) .
We can then estimate α0 by
1 n
n i∑
α̂0 = exp (ûi ) .
=1
1 https://en.wikipedia.org/wiki/Jensen
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 7 / 43
Polynomial regressions (simple case)

Generally, if we worry about non-linearities (after potential


log-transforms), these can be captured by adding polynomial terms,

y = β0 + β1 x + β2 x 2 + βk x k + u.

In most applications, adding quadratic (k = 2) and/or cubic (k = 3)


terms su¢ ce.
As with log-transforms, the non-linear components make
interpretation of coe¢ cients di¢ cult. E.g, with k = 2,

∆E [y jx ] ∂E [y jx ]
= β1 + 2β2 x and so ∆E [y jx ] ( β1 + 2β2 x ) ∆x
∆x ∂x
but this is only a good approx when ∆x "small".

Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics


Autumn 2018 8 / 43
Example: Polynomial wage equation

log(pay ) = β0 + β1 educ + β2 exp + β3 exp 2 + u

The partial e¤ect of exp(erience) on log(pay ) is

∆E [log(pay )jeduc, exp ] ∂E [log(pay )jeduc, exp ]


= β2 + 2β3 exp.
∆exp ∂exp
Thus,
\
∆log (pay )
β̂2 + 2 β̂3 exp,
∆exp
so the estimated e¤ect varies with the level of exp

Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics


Autumn 2018 9 / 43
Example: Polynomial wage equation
Di¤erent lines represent predicted ln(wage) at di¤erent levels of education

log(pay ) = β0 + β1 educ + β2 exp + β3 exp 2 + u

Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics


Autumn 2018 10 / 43
Interactions
Quadratic term, xj2 , is a special case of so-called interaction terms,
where you incl, e.g., xj1 xj2 .
Suppose we want to allow for the possibility that higher education
individuals have steeper experience pro…les. This can be captured by
log(pay ) = β0 + β1 educ + β2 exp + β3 exp educ + u.
Now the partial e¤ect of exp on log(pay ) is
∆E [log(pay )jeduc, exp ]
= β2 + β3 educ
∆exp
We can test for interaction e¤ects H0 : β3 = 0 against the two-sided
alternative
To allow for non-linear interactions, we can extend the model:
log(pay ) = β0 + β1 educ + β2 exp + β2 exp 2
+ β4 exp educ + β5 exp 2 educ + u.
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 11 / 43
Example: Returns to experience, at di¤erent education
levels
interaction between years_of_school and experience

Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics


Autumn 2018 12 / 43
Example: Nonlinear returns to experience, at di¤erent
education levels
Interaction between education and quadratic experience pro…le

Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics


Autumn 2018 13 / 43
Goodness-of-…t/Model selection

So which model should we use? Usually we think of a good model as


one which gives us unbiased (and precisely estimated) e¤ects.
One particular bias we are concerned with is omitted variable bias –
to avoid this, it is tempting too pursue a

kitchen sink approach: Throw in all variables at your disposal.

But that’s a bad idea! We know over-all variance of OLS will


generally increase if irrelevant regressors are included (see expression
of variance of OLS in MLR). Variance ") Less precise inference )
Less strong conclusions can be drawn.
How do we balance the risks of bias and imprecise inference? Can we
come up with a good model selection rule?

Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics


Autumn 2018 14 / 43
Some goodness-of-…t measures

One measure for the quality of a given regression is R 2 : But R 2


increases mechanically as we add more regressors ) R 2 is not a very
useful guide to evaluate models.
Alternative measure is the F statistic associated with a going from
large to small model. This uses the relative change in R-squared, is
generally more useful. But not always adequate:
Pool of candidate regressors may be large and/or functional form
speci…cation not evident
Models are non-nested
Sample size may be "small"
Multiple testing problem (size control when carrying out multiple
F -tests di¢ cult)

Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics


Autumn 2018 15 / 43
Adjusted R-squared
Can we adjust the standard R 2 so that it is better suited for model
selection? Recall the de…nition of R 2 :
SSR SSR/n
R2 = 1 =1
SST SST /n
The adjusted R 2 , R̄ 2 , "penalizes" big models,
2 SSR/(n k 1)
R =1
SST /(n 1)
where k = the number of covariates. As k ": SSR # but at the same
time 1/(n k 1) ".
2
Bottom line: R can go up or down when adding an explanatory
variable to the model. It provides a more fair comparison of di¤erent
speci…cations/models.
Alternative selection rules exist: Akaike’s Information Criterion (AIC),
Schwarz’s BIC, etc. – not covered here.
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 16 / 43
Adding more regressors may change interpretation

However, in some cases, adding regressors will change interpretation


of regression coe¢ cients
In particular, be careful when including other outcomes related to y
in a regression
Example: suppose we wish to estimate the e¤ect of parents in on
child’s income. Should we control for child’s education?

Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics


Autumn 2018 17 / 43
Intergenerational wage elasticity – with education as
control

Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics


Autumn 2018 18 / 43
Simultaneity a¤ects interpretation

Suppose that the true model is

lnpay = β0 + β1 lnfaminc + β2 edu + u,


edu = δ0 + δ1 lnfaminc + e,

where edu is individual’s years of education. Then lnfaminc a¤ects lnpay


through both channels:
directly, perhaps because higher income can purchase higher quality
schooling when young
indirectly, because it provides more resources for additional years of
school
So edu is (partially) an outcome of higher income

Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics


Autumn 2018 19 / 43
Two channels....

Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics


Autumn 2018 20 / 43
... a¤ect the overall e¤ect

Assume E [u jlnfaminc ] = E [e jlnfaminc ] = 0 so unobservables (such


as motivation, preference for education) that a¤ect schooling and pay
are uncorrelated with family income.
Then

E [lnpay jlnfaminc ] = β1 lnfaminc + β2 E [edu jlnfaminc ]


= ( β1 + β2 δ1 )lnfaminc

since E [edu jlnfaminc ] = δ1 lnfaminc.


So the FULL causal e¤ect of income on pay is ( β1 + β2 δ1 ), not just
β1 .
OLS would have captured the full e¤ect if we had left out edu...
A more complete treatment of simultaneous equations models will be
given next term

Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics


Autumn 2018 21 / 43
Be careful when including outcome variables in the
regression

Including outcome variables in a regression can sometimes be helpful


It can help show the channels by which x a¤ects y
e.g., rougly half of the e¤ect of parents income on children’s income is
through the eduction channel.
The “direct e¤ect” then captures all the channels not captured
through the modeled channel (e.g., nutrition)
But BE CAREFUL: IT IS NOT CORRECT to say that increasing
parent’s income 1% only increases child’s pay by .17%, because
child’s schooling is an omitted variable in the regression of child’s in
parent’s income.

Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics


Autumn 2018 22 / 43
Adding more variables: summary

1 When it solves an omitted variables bias problem


2 When we think there is good reason to allow more ‡exibility in how
the x variables a¤ect y
3 Adding more variables
Always increases R 2
Can either increase or reduce R̄ 2
Can either increase or reduce standard errors
BE CAREFUL: Adding other outcomes of the variable of interest in a
regression will change interpretation of coe¢ cient.

Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics


Autumn 2018 23 / 43
Continuous versus discrete random variables

Explanatory variables can either be


Continuous: In this case, we can talk about in…nitesimal change.
Discrete/categorical/ordinal (i.e., discrete, not continuous):
In…nitesimal change nonsensical.
Dummy variables: A special type of categorical variables with only 2
possible values (which we can always choose as 0 and 1).
Examples of dummy variables: gender, marital status, work etc.
(
1 if individual i is female
femalei =
0 if individual i is male

Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics


Autumn 2018 24 / 43
Dummy Variables as regressors - di¤erent interpretation

We can include dummy variables as regressors just as with any other


variable. For example,

log(pay ) = β0 + β1 lnfaminc + β2 female + u.

But interpretation is di¤erent due to its discrete nature:

∂E [log(pay )jlnfaminc, female ]


β1 = ,
∂lnfaminc
β2 = E [log(pay )jlnfaminc, female = 1] E [log(pay )jlnfaminc, fema

β1 measures the in…nitesimal change and we have


∆ log(pay ) = β2 ∆lnfaminc
β2 is an intercept shift.

Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics


Autumn 2018 25 / 43
Dummy variables – How it looks

Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics


Autumn 2018 26 / 43
Reference category

We can change the reference category without losing information:

log(pay ) = β0 + β1 educ + β2 female + u


= β0 + β1 educ + β2 (1 male ) + u
= ( β0 + β2 ) + β1 educ β2 male + u
= α0 + α1 educ + α2 male + u

where α0 = ( β0 + β2 ), α1 = β1 , α2 = β2 .

But the interpretation of the coe¢ cients change accordingly


Note that female and male cannot both be included since they are
perfectly collinear

Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics


Autumn 2018 27 / 43
Regressors with multiple discrete outcomes

Suppose x 2 f1, ...., J g is a categorical variables taking on J 3


discrete values: E.g., occupation, region, educational degree,...
We could potentially just include x in regression – but this assumes
that the e¤ect of x changing from 1 to 2 is the same as 2 to 3.
More ‡exible model allowing for heterogenous e¤ect across
categories: Create J 1 dummy variables (where we omit one, say
the last, x = J)
(
1 if x = j
dj = , for j = 1, ..., J 1.
0 otherwise

Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics


Autumn 2018 28 / 43
Dummy variables for categorical information

We can then estimate


J 1
y = β0 + ∑ βj dj + u
j =1

where the interpretation of the regression coe¢ cients is:


β0 = E [y jx = J ] is here the average outcome of those in the
reference group
βj = E [y jx = j ] E [y jx = J ] is the average outcome relative to the
omitted category
So to compare group k to group l we need to compare their
coe¢ cients:
βj βk = E [y jx = j ] E [y jx = k ]

Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics


Autumn 2018 29 / 43
Example: Returns to Eduction

The STATA command gen (Dschool) creates the dummy variables


Dschool1,....,Dschool12. For example:
Dschool1 = 1 ) spent 10 years in school (“left school at age 15”)
Dschool7 = 1 ) spent 16 years in school (“left school at age 21”)
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 30 / 43
Education dummy variables

Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics


Autumn 2018 31 / 43
Heterogenous returns to education

Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics


Autumn 2018 32 / 43
Discrete outcome variables: The binary case

It is also possible to use OLS with qualitative outcome variables:


Suppose y is binary taking two values, y 2 f0, 1g.
Numerous examples, such as:
College educated (y = 1) vs. non-college educated (y = 0)
Working (y = 1) vs. not working (y = 0)
MLR with qualitative outcome variables is called a Linear Probability
Model

Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics


Autumn 2018 33 / 43
Linear Probability (LP) Model
Consider the following MLR for a binary outcome variable y 2 f0, 1g:

y = β0 + β1 x1 + . . . + βk xk + u

Since y is binary,

E [y jx1 , x2 , ..., xk ] = Pr(y = 1jx1 , x2 , ..., xk )

and so the MLR implies

Pr(y = 1jx1 , x2 , ..., xk ) = β0 + β1 x1 + . . . + βk xk

This means that βj measures how the probability of success (y = 1)


changes when xj changes by amount ∆xj , everything else …xed

∆ Pr(y = 1jx1 , x2 , ..., xk ) = βj ∆xj ) ∆ Pr(y = 1jx1 , x2 , ..., xk )/∆xj =

For example, the increase in the probability of working given a change


in education, holding experience (and other factors) …xed
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 34 / 43
Shortcoming of LP Model
While the LP model is easy to estimate, it has some shortcomings:
We can get predictions that fall outside [0, 1] – inconsistent with the
fact that we are estimating probabilities
it does not work well for values of the x’s far away from the sample
averages – so be cautious in interpreting the estimates for such values
of x.
It is inherently heteroskedastic, since
Var(y jx ) = Pr(y = 1jx1 , x2 , ..., xk )(1 Pr(y = 1jx1 , x2 , ..., xk ))
The formula above is just a generalization of the unconditional
variance of binary random variable, Var(y ) = p (1 p )
Note that if Pr(y = 1jx1 , x2 , ..., xk ) = 0 or 1, Var(y jx1 , x2 , ..., xk ) = 0
Note that if Pr(y = 1jx1 , x2 , ..., xk ) = 0.5, Var(y jx1 , x2 , ..., xk ) = 0.2.
the conditional variance of y varies with x – so there is
heteroskedasticity!
Thus, our current tools for inference do not apply here!
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 35 / 43
Example of LP Model: Labour force partipation (inlf = 1)

Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics


Autumn 2018 36 / 43
Inconsistent predictions with LP model

The command "predict" creates ŷ = β̂0 + β̂1 x1 + ... + β̂k xk

8 of these predictions fall outside the unit interval...

Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics


Autumn 2018 37 / 43
Interaction models with dummy variables

We have seen how to i) use dummy variables, and ii) interactions


Interactions with dummy variables are very useful to allow for
heterogeneous intercept and/or slopes (often called “structural
breaks”)
As an example take the wage equation

log(pay ) = β0 + β1 years_of _school + β2 high_exp

+ β3 years_of _school high_exp + u


where high_exp = 1 if experience >20 years
We allow
the level of the wage equation (i.e. intercept) to be di¤erent between
high and low experienced
di¤erential return to education (i.e. slope) between high and low
experienced
Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 38 / 43
Interpretation of model

Our fully interacted model,

log(pay ) = β0 + β1 years_of _school + δ0 high_exp


+δ1 years_of _school high_exp + u,

implies two separate wage equations, one for low experience, one for high
experience:

log(pay ) = β0 + β1 years_of _school + u, if high_exp = 0


log(pay ) = α0 + α1 years_of _schooli + u, if high_exp = 1

where α0 = ( β0 + δ0 ), α1 = ( β1 + δ1 ). Thus, much more ‡exible.

Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics


Autumn 2018 39 / 43
Structural Breaks

Our dummy-interaction model is a special case of a structural break


model.
De…nition: A structural break or change occurs when (at least some
of) the model parameters di¤er across subsets of the sample
Examples:
In time series, coe¢ cients may di¤er across time periods
In labor economics, coe¢ cients may di¤er by race or gender or
experience
In cross-country analyses, coe¢ cients may di¤er by government type
We can test for structural breaks using a F -test (often called a
"Chow Test" when testing structural breaks)

Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics


Autumn 2018 40 / 43
Wage equations by experience level

Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics


Autumn 2018 41 / 43
Interpretation of results

Di¤erence in intercepts between high and low experience = 4.894727


- 4.72645=.1682767 .
This is the di¤erence between high and low (potential) experience
people, evaluated at 0 years of education
Di¤erence in slopes= .1019567-.0959294=.0060273
Are these di¤erences signi…cant? Use the "Chow Test"!

Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics


Autumn 2018 42 / 43
Chow Test in action

Chow F -statistic = 755.69 with p-value = 0.0000 ) reject


Dennis Kristensen (UCL) ECON0019: Quantitative Economics and Econometrics
Autumn 2018 43 / 43

You might also like