Autocorrelation
Autocorrelation
Autocorrelation
Recap:
Regression equation
Yi 1 2 X i i
Yi depends on X i and i
If Xi and
i varies in generation then no way we can make any statistical
inference about Yi and also 1 and 2
The assumption made on variables Xi and error term i are critical to the valid
interpretation of the regression estimates.
No autocorrelation between the disturbances. Given any two values X i and X j , the
Correlation between members of the series observation ordered in time (time series
data) i.e quarterly time series data for regression output on labour and capital inputs. If
there is labour strike affecting output in one quarterly there should be no reason for
disruption will be carried over to the next quarterly.
Or in cross-sectional data=> correlation between members of series observation
ordered in space i.e regression of family consumption expenditure is not expected to
affect the consumption expenditure of another family.
Nature of autocorrelation
1. Inertia: most common feature of economic time series data (i.e GNP, price
indexes, production, employment, unemployment etc. => Business cycles)
2. Specification bias: Excludes variables case; researcher starts with plausible
regression model which may not be the most perfect. Consider the following
example, the true regression id given by
Yt 1 2 X 2t 3 X 3t 4 X 4t t
But researcher uses the following regression equation
Yt 1 2 X 2t 3 X 3t t
Where t 4 X 4t t
3. Specification bias; Incorrect function form
Consider the true/correct model in a cost-output study is
MCosti 1 2Outputi i
4. Cobweb phenomenon
Supply reacts to price with a lag of one time period (supply decision take time
to implement)
Supply t 1 2 Pt 1 t
5. Lags
Example: current consumption depends on consumptions on the previous
period
6. Manipulation of data
Examples
1. Time series data involving quarterly data derived from monthly data (finding
average of the three month)
2. Interpolation or extrapolation of data; census data interpolate on the basis of
some ad hoc assumptions.
The techniques used in manipulation of data might impose upon the data a
systematic pattern which might not exist in the original data.
Where, Y Consumption expenditure and X income. Assuming the model hold true
Yt 1 1 2 X t 1 t 1
Recall: Yt 1 2 X t t
Consider model has satisfied other OLS assumption and we introduce autocorrelation
into disturbance term.
E ( t , t s ) 0 ( s 0)
Then
t t 1 t 1 1
Where
coefficient of auto-covariance
E ( t ) 0
Var ( t ) 2
Cov( t , t s ) 0 ( s 0)
Note:
t t 1 t known as Markov first-order
autoregressive scheme (first-order autoregressive scheme) noted by AR1
Third-order
From
t t 1 t
(rho) Coefficient of auto-covariance is interpreted as the first-order
coefficient of autocorrelation (the coefficient of autocorrelation at lag 1)
Et E ( t )
. t 1 E ( t 1 )
Var ( t ) . Var ( t 1 )
E t
. t 1 )
Var ( t 1 )
2
Var ( t ) E ( t2 )
1 2
2
Cov( t , t s ) E ( t .t s ) s
1 2
Cor( t , t s ) s
Where Cov( t , t s ) means covariance between error terms s periods apart
and where Cor( t , t s ) means correlation between error terms s periods apart
Yt 1 2 X t t
2
Its variance is given by 2
x 2
t
Var 2 AR (1)
2
1 2
x x
t t 1
.... 2 n1
x1 xn
xt2 x 2
t xt2
Where Var 2 AR (1) means variance of 2 under first-order autoregressive
1 r
Var ( 2 )OLS
1 r
Example:
Note: when there is autocorrelation the usual OLS will underestimate the variance
( 2 ) AR (1)
Durbin-Watson test
The presence of first order autocorrelation is tested by utilizing the table of Durbin-
Watson statistics at the 5 of 1% level of significance for n observation and k
explanatory variables. The Durbin-Watson statistics calculated as the ratio of the sum
of square differences in successive residual to the residual sum of squares (RSS)
(
t n
t 2
)2
d t
t n 2
t 1
t 1
t
t t 1 t
4. The error term ut is assumed to be normally distributed
5. The regression model does not include the lagged value(s) of the
dependentvariable as one of the explanatory variables
6. There are no missing observations in the data
Estimation of d-statistics
d
2
t
2 2
t 1 t t 1
t
2
Since
t
2 2
t 1
t t 1
d 21
t
2
t t 1
d 2(1 )
t2
Since 1 1 implies 0 d 4
Durbin and Watson were successful deriving a lower bound dL and an upper bound dU
such that if thecomputed d lies outside these critical values, a decision canbe made
regarding the presence of positive or negative serial correlation
Compute the Durbin–Watson test (assuming that the assumptions underlying the test
are fulfilled)
1. Run the OLS regression and obtain the residuals
3. For the given sample size and given number of explanatory variables, find out the
critical dL and dU values.
Decision rules:
These limits depend only on the number of observations n and the number of
explanatory variables and do not depend on the values taken by these explanatory
variables. These limits, for n going from 6 to 200 and up to 20 explanatory variables,
have been tabulated by Durbin and Watson
DL=1.44
DU=1.54
Due to indecisive zone, one cannot conclude that (first-order) autocorrelation does or
does not exist. However, it has beenfound that the upper limit dU is approximately the
true significance limit and therefore in case d lies in the indecisive zone, one can use
the following modified d test: Given the level of significance α,
Yt 1 2 X t t
t t 1 t
Step (1): Estimate the regression model by OLS and compute its estimated residual
et
Yt 1 2 X t t
t t 1 ... t p t
The null hypothesis is that each of S is zero, against the alternative that at least one
of them is not zero.
Step (1): Estimate the original regression by OLS and obtain the residuals et
Step (2): Regress et against all the independence variables X t1 ,...., X tk plus
auxiliary regression is T P
Step (3): Compute T P Re2 From step 2 if exceeds P2,(1 ) the value of the
chi-square distribution with P d.f then reject H 0 : 1 2 ... P 0 in
Remedial Measures
(1) Find out if the autocorrelation is pure autocorrelation and not the result of
mis-specification of the model
(2) If pure autocorrelation, one can use appropriate transformation of the
ordinal model so that in the transformed model we do not have the problem
of (pure) autocorrelation (use of generalized least square GLS method)
(3) In large sample, use the Newey-west method to obtain standard errors of
OLS estimation that are corrected for autocorrelation (Extension of whites
heteroscadasticity- consistent standard errors method
(4) In same situation use the OLS mention.
The remedy depends on the knowledge about the nature independence among the
disturbances
Yt 1 2 X t t
1 1
Case 1: is known
Yt Yt 1 (1 )1 2 ( X t X t 1 ) t
Where
t ( t t 1 )
We can express as
Yt* 1* 2* X t* t
Since the t error term satisfies the usual OLS assumptions, we apply OLS to the
transformed variables Yt * and X t* and obtain estimators with all the optimum
properties BLUE
In this differencing procedure we lose one observation because the first observation
has no antecedent. To avoid this loss of one observation, the first observation on Y and
X is transformed as follows:
GLS is nothing but OLS applied to the transformed model that satisfies the classical
assumptions.
When is not known, several methods can be used to estimate the value of
namely; based on Durbin-Watson d statistics, from the relationship between d and
can be used to estimate the value of as follows:
1
d
2
t t 1 t
The first-difference methods
Since lies between 0 and ±1, one could start from extreme position.
Yt Yt 1 (1 )1 2 ( X t X t 1 ) t
If
1
Yt Yt 1 2 ( X t X t 1 ) (ut ut 1 )
This may be appropriate if the coefficient of autocorrelation is very high.
n
e t
2
g 2
n ut et
u 2 Where, are OLS residuals from original regression, are
t
1
OLS residuals from the first-difference regression. Use Durbin-Watson table except
An estimate of is obtained and then used to estimate GLS. All this methods of
estimation are known as feasible GLS (FGLS) or estimated GLS (EGLS) methods.
HAC standard errors are much greater than OLS standard errors. HAC t-ratios are
much smaller than OLS t-ratios.