[go: up one dir, main page]

0% found this document useful (0 votes)
15 views50 pages

Chapter 2

Chapter 2 provides an overview of the classical linear regression model, emphasizing its significance in econometrics for analyzing relationships between dependent and independent variables. It distinguishes regression from correlation, highlighting that regression estimates the average value of one variable based on others, while correlation measures the strength of association without implying causation. The chapter also introduces the simple linear regression model, its components, and the ordinary least squares method for estimating regression coefficients.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views50 pages

Chapter 2

Chapter 2 provides an overview of the classical linear regression model, emphasizing its significance in econometrics for analyzing relationships between dependent and independent variables. It distinguishes regression from correlation, highlighting that regression estimates the average value of one variable based on others, while correlation measures the strength of association without implying causation. The chapter also introduces the simple linear regression model, its components, and the ordinary least squares method for estimating regression coefficients.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Chapter 2

An Overview of the
classical linear regression model

Instructor: Badassa Wolteji (PhD)

1
Regression

 Regression is probably the single most important tool at


the econometrician’s disposal.

 But what is regression analysis?

 It is concerned with describing and evaluating the

relationship between a given variable (dependent variable)


and one or more other variables (independent variable(s)).

2
Terminology and notation

 Denote the dependent variable by y and the


independent variable(s) by x1, x2,..., xk where there are
k independent variables.
 Note that there can be many x variables but we will
limit ourselves to the case where there is only one x
variable to start with.
 In our set-up, there is only one y variable.
 We later include more x’s – the multiple regression
case

3
Terminology and notation

 In the literature the terms dependent


variable and explanatory variable are
described variously. A representative list
is:
Regression is different from Correlation

 If we say y and x are correlated, it means that we are


treating y and x in a completely symmetrical way.

 In regression, we treat the dependent variable (y) and


the independent variable(s) (x’s) very differently.

 The y variable is assumed to be random or

“stochastic” i.e. it has a probability distribution.

 The x variables are, however, assumed to have fixed

(“non-stochastic”) values in repeated samples.


5
Regression versus correlation

 Although regression analysis deals with the


dependence of one variable on other variables ,it
does not necessarily imply causation.
 In the crop-yield association, there is no
statistical reason to assume that rainfall does not
depend on crop yield.

6
Regression versus correlation

 The fact that we treat crop yield as dependent on


rainfall (among other things) is due to non-
statistical considerations.
 Common sense suggests that the relationship
cannot be reversed, for we cannot control rainfall
by varying crop yield.
 A statistical relationship in itself cannot logically
imply causation.
 To ascribe causality, one must appeal to a priori
or theoretical considerations.
7
Regression versus correlation

 In correlation analysis, the primary objective is to


measure the strength or degree of linear association
between two variables.
 In regression analysis, we try to estimate or predict
the average value of one variable on the basis of the
fixed values of other variables.
 Regression and correlation have some fundamental
differences.
 In regression analysis there is an asymmetry in the
way the dependent and explanatory variables are
treated.

8
Regression versus correlation

 In correlation analysis, we treat any variables


symmetrically; there is no distinction between the
dependent and explanatory variables.
 Whereas most of the regression theory to be dealt
with here is conditional upon the assumption that the
dependent variable is stochastic but the explanatory
variables are fixed or non-stochastic.

9
Simple Regression

 For simplicity, say k=1. This is the situation where y


depends on only one x variable.
 Examples of the kind of relationship that may be of
interest include:
 How labour productivity varies with training types

 Measuring the long-term relationship between crop


yield and fertilizer use

10
The Simple Regression Model

 Definition of the simple linear regression model

Explains variable in terms of variable


Intercept Slope parameter

Dependent variable,
explained variable, Error term,
Independent variable, disturbance,
response variable,…
explanatory variable, unobservables,…
regressor,…
The Simple Regression Model

 Interpretation of the simple linear regression model

Studies how varies with changes in :

as long as

By how much does the dependent Interpretation only correct if all other
variable change if the independent things remain equal when the indepen-
variable is increased by one unit? dent variable is increased by one unit

 The simple linear regression model is rarely applicable in practice but


its discussion is useful for pedagogical reasons
The Simple Regression Model

 Example: Soybean yield and fertilizer

Rainfall,
land quality,
presence of parasites, …
Measures the effect of fertilizer on
yield, holding all other factors fixed

 Example: A simple wage equation

Labor force experience,


tenure with current employer,
work ethic, intelligence …
Measures the change in hourly wage
given another year of education,
holding all other factors fixed
Simple Regression: An Example
 Suppose that we have the following data on productivity and
primary enrollment for a country.
Year Agriculture_VA_lab Agri_land_hect Agri_population enroll_primary (% gross)
1985 139 57690000 38323000 38
1986 157 57385000 39421000 38
1987 178 57030000 40552000 41
1988 169 56775000 41719000 42
1989 164 56520000 42925000 41
1990 168 56312000 44173000 37
1991 167 56158000 45444000 33
1992 160 56105000 46741000 26
1993 172 30540000 45536000 23
1994 161 30472000 46844000 27
1995 161 30500000 48118000 31
1996 183 30500000 49347000 37
1997 181 30492000 50536000 42
1998 158 30508000 51693000 51
1999 159 30676000 52831000 50
2000 159 30662000 53957000 55
2001 168 31409000 55075000 60
2002 160 30604000 56179000 63
2003 139 31607000 57265000 65
2004 158 33101000 58326000 69
2005 174 33691000 59358000 81
2006 190 34219000 60362000 87
2007 201 35077000 61342000 95
2008 211 34513000 62294000 102
2009 219 63231000 102
2010 226 64158000 102
2011 235 65076000
14
Simple Regression: An Example

 We have some intuition that the beta on primary school


gross enrollment is positive, and we therefore want to
find whether there appears to be a relationship between x
and y given the data that we have.
 The first stage would be to form a scatter plot of the two
variables.
 We can do this by using stata:
scatter agriculture_va_lab enroll_primarygross

15
Graph (Scatter Diagram)
240
220
200
180
160
140

20 40 60 80 100
enroll_primary (% gross)

16
Finding a Line of Best Fit

 We can use the general equation for a straight line,


y=a+bx
to get the line that best “fits” the data.

 However, this equation (y=a+bx) is completely


deterministic.

 Is this realistic? No. So what we do is to add a random


disturbance term, u into the equation.
yt =  + xt + ut
where t = 1985,1986,1987,…,2011.
17
Why do we include a Disturbance term?

 The disturbance term can capture a number of


features:
 We always leave out some determinants of yt
 There may be errors in the measurement of yt that
cannot be modelled.
 Random outside influences on yt which we cannot
model

18
Determining the Regression Coefficients

 So how do we determine what  and  are?


 Choose  and  so that the (vertical) distances from the data points to the
fitted lines are minimised (so that the line fits the data as closely as
possible):

19
x
Ordinary Least Squares

 The most common method used to fit a line to the data is known as
OLS (ordinary least squares).

 What we actually do is to take each distance and square it (i.e. take the
area of each of the squares in the diagram) and minimise the total sum
of the squares (hence least squares).

 Tightening up the notation, let


yt denote the actual data point t
ŷt denote the fitted value from the regression line
û t denote the residual, yt - ŷt

20
Actual and Fitted Value

.
y

yi

û i

ŷi

xi x

21
How OLS Works
5
 So min.uˆ
2
1  uˆ  uˆ  uˆ  uˆ , or minimise uˆ 2 . This is known as the
2
2
2
3
2
4
2
5 t
residual sum of squares. t 1

 But what was û t ? It was the difference between the actual point and the
line, yt - ŷ .
t

 
So minimising ty  ˆ
y t 2
is equivalent to minimising  t
ˆ
u 2

with respect to $ and $ .

22
Deriving the OLS Estimator

 But yˆ t  ˆ  ˆxt , so let L   ( yt  yˆ t ) 2   ( y t  ˆ  ˆxt ) 2


t i

 Want to minimise L with respect to (w.r.t.) $ and $ , so differentiate L


$ w.r.t. $ and
L
 2 ( yt  ˆ  ˆxt )  0 (1)
ˆ t

L
 2 xt ( yt  ˆ  ˆxt )  0 (2)
ˆ t

 From (1),  ( y t  ˆ  ˆxt )  0  y t  Tˆ  ˆ  xt  0


t

 But  y t  Ty and  x t  Tx .

23
Deriving the OLS Estimator (cont’d)

ˆ
 So we can write Ty  Tˆ  Tˆx  0 or y  ˆ  x  0 (3)
 From (2),  xt ( yt  ˆ  ˆxt )  0 (4)
t

 From (3), ˆ  y  ˆx (5)


 Substitute into (4) for $ from (5),
 x (y
t
t t  y  ˆx  ˆxt )  0

 xt yt  y  xt  x  xt    xt  0
ˆ ˆ 2

t t
x
t
y  T y x   t
ˆTx 2  ˆ x 2  0

24
Deriving the OLS Estimator (cont’d)

 Solving for, $ ˆ (Tx 2   xt2 )  Tyx   xt yt


ˆ
  xt yt  Txy
and  ˆ  y  ˆx

 So overall we have  xt2  Tx 2
 This method of finding the optimum is known as ordinary least
squares(OLS).

reg agriculture_va_lab enroll_primarygross


Source SS df MS Number of obs = 26
F( 1, 24) = 21.91
Model 5651.68212 1 5651.68212 Prob > F = 0.0001
Residual 6190.47173 24 257.936322 R-squared = 0.4773
Adj R-squared = 0.4555
Total 11842.1538 25 473.686154 Root MSE = 16.06

agriculture_va_lab Coef. Std. Err. t P>t [95% Conf. Interval]


enroll_primarygross .5940433 .126907 4.68 0.000 .3321202 .8559664
_cons 139.5295 7.693246 18.14 0.000 123.6514 155.4075

25
What do We Use $ and $ For?

 In the example used above, plugging the 26


observations into make up the formulae given above
would lead to the estimates
$ = 139.5295 and $ = 0.594.
 We would write the fitted line as 𝑦=139.53+0.594x
 Where y is agricultural labour productivity and x is
gross enrollment(%).
 Question: if the gross enrollment increases by 1%, what
will happen to agricultural labour productivity?

26
Accuracy of Intercept Estimate

 Care needs to be exercised when considering the


intercept estimate, particularly if there are no or
few observations close to the y-axis:
y

0 x

27
The Population and the Sample

 The population is the total collection of all objects or


people to be studied.
 A sample is a selection of just some items from the
population.
 A random sample is a sample in which each
individual item in the population is equally likely to
be drawn.

28
The Data Generating Process (DGP) and the PRF

 The population regression function (PRF) is a description of the


model that is thought to be generating the actual data and the true
relationship between the variables (i.e. the true values of  and ).

 The PRF is yt    xt  ut

 The Sample Regression Function (SRF) is yˆ t  ˆ  ˆxt


and we also know that uˆt  yt  yˆ t

 We use the SRF to infer likely values of the PRF.

 We also want to know how “good” our estimates of  and  are.

29
Linearity

 In order to use OLS, we need a model which is linear in the parameters (


and  ). It does not necessarily have to be linear in the variables (y and x).

 Linear in the parameters means that the parameters are not multiplied
together, divided, squared or cubed etc.

 Some models can be transformed to linear ones by a suitable substitution


or manipulation, e.g. the exponential regression model

Yt  e X t eut ln Yt     ln X t  ut
 Then let yt=ln Yt and xt=ln Xt

yt    xt  ut

30
Linear and Non-linear Models

 This is known as the exponential regression model. Here, the coefficients


can be interpreted as elasticities.

 Similarly, if theory suggests that y and x should be inversely related:



yt     ut
xt
then the regression can be estimated using OLS by substituting
1
zt 
xt
 But some models are intrinsically non-linear, e.g.

yt    xt  ut

31
Estimator or Estimate?

 Estimators are the formulae used to calculate the coefficients

 Estimates are the actual numerical values for the coefficients.

32
The Assumptions Underlying the Classical Linear
Regression Model (CLRM)

 The model which we have used is known as the classical linear regression model.
 We observe data for xt, but since yt also depends on ut, we must be specific about
how the ut are generated.
 We usually make the following set of assumptions about the ut’s (the
unobservable error terms):
 Technical Notation Interpretation
1. E(ut) = 0 The errors have zero mean
2. Var (ut) = 2 The variance of the errors is constant and finite
over all values of xt
3. Cov (ui,uj)=0 The errors are statistically independent of
one another
4. Cov (ut,xt)=0 No relationship between the error and
corresponding x variate

33
The Assumptions Underlying the CLRM Again

 An alternative assumption to 4., which is slightly stronger, is that the


xt’s are non-stochastic or fixed in repeated samples.

 A fifth assumption is required if we want to make inferences about the


population parameters (the actual  and ) from the sample parameters
( $ and $ )

 Additional Assumption
5. ut is normally distributed

34
Properties of the OLS Estimator

 If assumptions 1. through 4. hold, then the estimators $ and $ determined by


OLS are known as Best Linear Unbiased Estimators (BLUE).
What does the acronym stand for?

 “Estimator” - $ is an estimator of the true value of .


 “Linear” - $ is a linear estimator
 “Unbiased” - On average, the actual value of the $ and $ ’s will be equal to
the true values.
 “Best” - means that the OLS estimator $ has minimum variance among
the class of linear unbiased estimators.

35
Consistency/Unbiasedness/Efficiency

 Consistent
The least squares estimators$ and$ are consistent. That is, the estimates will
converge to their true values as the sample size increases to infinity. Need the
assumptions E(xtut)=0 and Var(ut)=2 <  to prove this. Consistency implies that

 Unbiased
 
lim Pr ˆ      0   0
T 

The least squares estimates of $ and $ are unbiased. That is E($ ) = and E($ ) =
Thus on average the estimated value will be equal to the true values. To prove
this also requires the assumption that E(ut)= 0. Unbiasedness is a stronger
condition than consistency.
 Efficiency
An estimator $ of parameter  is said to be efficient if it is unbiased and no other
unbiased estimator has a smaller variance. If the estimator is efficient, we are
minimising the probability that it is a long way off from the true value of .

36
Example: How to Calculate the Parameters and Standard
Errors

 Assume we have the following data calculated from a regression of y on a


single variable x and a constant over 22 observations.
 Data:

 x y  830102, T  22, x  416.5, y  86.65,


t t

 x  3919654, RSS  130.6


2
t

 Calculations: $ 830102  (22 * 416.5 * 86.65)


  2  0.35
3919654  22 *(416.5)

$  86.65  035 .  5912


. * 4165 .
 We write
yˆ t  ˆ  ˆxt
yˆ t  59.12  0.35xt

37
Example (cont’d)

 SE(regression), s   uˆ t2

130.6
 2.55
T 2 20

3919654
SE( )  2.55 *  3.35

22  3919654  22  416.5 
2

1
SE(  )  2.55 *  0.0079

3919654 22  416.5 2

 We now write the results as

yˆ t   59.12  0.35xt
(3.35) (0.0079)

38
An Introduction to Statistical Inference

 We want to make inferences about the likely population values from the
regression parameters.

Example: Suppose we have the following regression results:

yˆ t  20.3  0.5091xt
(14.38) (0.2561)

 $  0.5091 is a single (point) estimate of the unknown population


parameter, . How “reliable” is this estimate?

 The reliability of the point estimate is measured by the coefficient’s


standard error.

39
Hypothesis Testing: Some Concepts

 We can use the information in the sample to make inferences about the
population.
 We will always have two hypotheses that go together, the null hypothesis
(denoted H0) and the alternative hypothesis (denoted H1).
 The null hypothesis is the statement or the statistical hypothesis that is actually
being tested. The alternative hypothesis represents the remaining outcomes of
interest.
 For example, suppose given the regression results above, we are interested in
the hypothesis that the true value of  is in fact 0.5. We would use the notation
H0 :  = 0.5
H1 :   0.5
This would be known as a two sided test.

40
One-Sided Hypothesis Tests

 Sometimes we may have some prior information that, for example, we


would expect  > 0.5 rather than  < 0.5. In this case, we would do a
one-sided test:
H0 :  = 0.5
H1 :  > 0.5
or we could have had
H0 :  = 0.5
H1 :  < 0.5
 There are two ways to conduct a hypothesis test: via the test of
significance approach or via the confidence interval approach.

41
The Probability Distribution of the Least Squares
Estimators

 We assume that ut  N(0,2)

 Since the least squares estimators are linear combinations of the random
variables
i.e. $   wt yt

 The weighted sum of normal random variables is also normally distributed, so


$  N(, Var())
$  N(, Var())

 What if the errors are not normally distributed? Will the parameter estimates
still be normally distributed?
 Yes, if the other assumptions of the CLRM hold, and the sample size is
sufficiently large.
42
The Probability Distribution of the Least Squares
Estimators (cont’d)

 Standard normal variates can be constructed from $ and $ :

ˆ   and ˆ   ~ N 0,1
~ N 0,1
var   var  

 But var() and var() are unknown, so

ˆ   ˆ  
~ tT  2 and ~ tT 2
SE (ˆ ) ˆ
SE (  )

43
Testing Hypotheses: The Test of Significance
Approach

 Assume the regression equation is given by , for t=1,2,...,T

yt    xt  ut
 The steps involved in doing a test of significance are:
1. Estimate $, $ andSE($ ) , SE( $ ) in the usual way

2. Calculate the test statistic. This is given by the formula


$   *
test statistic 
SE ( $ )
where  *is the value of  under the null hypothesis.

44
The Test of Significance Approach (cont’d)

 We need some tabulated distribution with which to


compare the estimated test statistics. Test statistics derived
in this way can be shown to follow a t-distribution with T-
2 degrees of freedom.

 As the number of degrees of freedom increases, we need


to be less cautious in our approach since we can be more
sure that our results are robust.

45
The Test of Significance Approach (cont’d)

 We need to choose a “significance level”, often denoted .

This is also sometimes called the size of the test and it


determines the region where we will reject or not reject the
null hypothesis that we are testing. It is conventional to
use a significance level of 5%.

 Intuitive explanation is that we would only expect a


result as extreme as this or more extreme 5% of the time as
a consequence of chance alone. Conventional to use a 5%
size of test, but 10% and 1% are also commonly used.
46
The Confidence Interval Approach to Hypothesis
Testing

 An example of its usage: We estimate a parameter,


say to be 0.93, and a “95% confidence interval” to be
(0.77,1.09). This means that we are 95% confident that
the interval containing the true (but unknown) value
of .

 Confidence intervals are almost invariably two-sided,


although in theory a one-sided interval can be
constructed.

47
Some More Terminology

 If we reject the null hypothesis at the 5% level, we say


that the result of the test is statistically significant.

48
The t-ratio: An Example

 Suppose that we have the following parameter estimates, standard errors


and t-ratios for an intercept and slope respectively.

Coefficient 1.10 -4.40


SE 1.35 0.96
t-ratio 0.81 -4.63

Compare this with a tcrit with 15-3 = 12 d.f.


(2½% in each tail for a 5% test) = 2.179 5%
= 3.055 1%
 Do we reject H0: 1 = 0? (No)
H0: 2 = 0? (Yes)

49
What Does the t-ratio tell us?

 If we reject H0, we say that the result is significant. If the coefficient is not
“significant” (e.g. the intercept coefficient in the last regression above), then
it means that the variable is not helping to explain variations in y.
 Variables that are not significant are usually removed from the regression
model.
 In practice there are good statistical reasons for always having a constant
even if it is not significant. Look at what happens if no intercept is included:
yt

xt

50

You might also like