0 ratings0% found this document useful (0 votes) 195 views40 pagesCH - 4 - Application To Time Series and Panel Data in Stata
Application to Time Series and Panel Data in stata
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
Mengistu Yismaw (MSc.)
Department of Economics
Debre Markos University (Burie Campus)
Email: menyis.2012@gmail.comChapter Outline
“ Introduction to time series regression model
= what is time series data
* Dealing with Time series data: Declare the data as time series,
sorting.
+ Some Diagnostic tests
© Autocorrelation: Durbin Watson test, Breusch-Godtrey Test
* Dealing with autocorrelation
* Stationarity test
© Informal tests: Graphical methods of detecting non-stationarity
© Formal tests: Unit root test
Estimation of stationary data: OLS
= Dealing with non-stationary data
© Differencing
© Detrending
* Estimation of non-stationary data: Cointegration and ECM
“+ Introduction to Panel regression model
= what is panel data
* Specification and Estimation of panel regression model
Fixed vs random effect model: Hausman Test
IU ee EO4.1. Introduction to Time Series Regression Model
What is Time Series Data
Q Time-series data is a sequence of data points collected over time
© Itallow us to track changes over time.
Q Time-series data can track changes over years, quarters, months, weeks, days,
seconds...
Q Acollection of random variables ordered in time known as stochastic process
Ca
UY. TUL ce
CMO)In estimating time series data, the first thing we need to do is
declaring the data is time series.
To declare the data as yearly data
i. Using menu bar
Click statis >tie series > st ups and utes > declare Ti
Then you will see the following dialog box
ji, Using command bar
Syntax: tsset timevar, yearly
Example: tsset Year, yearly
Note: you cane declare time series data at any time you want.
For example,
Y Quarterly: tsset Year, quarterly
¥ Monthly: tsset Year, monthly
Y Daily: tsset Year, daily
rT
UY. Me Le ea OTVv
Time series data have certain characteristics that cross-sectional data do not,
These require special attention when applying OLS to time series data.
Then before making diagnostic tests, we have to sort the data based time
Syntax: sort timevar.
Example: sort Year
Autocorrelation Test
Autocorrelation (serial correlation) problem arises when error terms in a regression model
correlate over time or dependent one each other.
icc. if there is autocorrelation problem, cov (wi, uj [Xi, Xj) =E(ui, uj) #0 wheres i # j
Ca
UY. TUL ee1. Informal Tests
C1 It helps to visualize if there is autocorrelation problem
A. Draw time series graph of the residual
> Follow the following steps
Run the regression model
ji, Predict the error term (Ui)
ili, Draw time series line for ui
Example:
reg NPL Loans
predict uhat, residual
tsline uhat
You can draw with reference line
tsline uhat if e(sample)==1, yline(0)
Me Le ea OTB. Correlogram and Autocorrelation Function (ACF)
syntax: corrgram var
Example: corrgram uhat
Q Plot the autocorrelation function
Q ACF relates the correlation coefficient within a
given variable over time to its lag value a
SRRESEERRSGEERIELELEELES
SSSEEREEEEG EE
> Le. it shows corr(Yt, Yt-k)/var(¥t)
syntax: ac var
Example: ac uhat
TM ee8. Run axillary regression
Syntax: reg uhat l.uhat
Qiif the coefficient is high and statistically significant,
indicates evidence of autocorrelation problem. = ae
> In our case the coefficient is high enough (0.369) and
statistically significant @ 5% (p-value =0.002)
© Implies there is autocorrelation problem in our model
FE Te SESS ETT OCHRE SAA SOOSIUNIVERSTITIONAUTII, Formal tests
Celi)
C It helps to get statistical evidences about the presence
of autocorrelation problem
A. Durbin Watson test
> Follow the following steps
1 Run the regression model
24 Run Watson test
Syntax: estat dwatson
Example:
reg NPL Loans
estat dwatson
® Or you can use ‘dwstat’ command instead of ‘estat
dwatson’
Ca
UY.
eat deatson
Dusbin-watson d-otatistic( 2, 66) = 1.208867
tatson d-statistic( 2, 66) = 1.208847
Decision
+ Ifthe model has no autocorrelation problem, d-
statistics will be 2 (very close to 2).
4 Ifthe model has perfect (serious) autocorrelation
problem, d- d-statistics will be 0 (close to 0)..
» For the above example, since the d-statistics is not 2
(not very close to 2), our model has some
autocorrelation problem,
TUL eeB. Breusch-Godfrey Test
Syntax: bgodfrey
Decision: \f p-value less the intended level of significance, we have to reject Ho
© There is autocorrelation problem
> For the above example, since the p-value (0.0022) lower than the intended level
of significance (10%), we have to reject Ho
© There is serial correlation (autocorrelation) problem
GIL ee CeO
okPEE arate)
Use Cochrane-Orcutt regression
Q The ‘prais’ command is used to perform
Cochrane-Orcutt transformation
Example: prais NPL Loans, core
Note: Durbin Watson or Breusch-Godfrey Test }
appropriate after corc regression. a
ot: avadtabiatax: pais
© Because the problem is already corrected. raz),
« byodtrey
© Please look at the transformed d-statistics.
[This command only works after regre:
(301)
UY. TUL ee
CaCeli)
2, Stationarity (unit root) Test
CNon-stationary time series data will have a time
varying mean or a time-varying variance or both.
; Tea nEPanEoE
© Which makes forecasting or prediction difficult.
|
|, Informal tests
OQ Helps to visualize and check if there is some sort of
negative or positive trend in the data, it is an
indication of stationarity
A. Autocorrelation graph
Syntax: ac var
Example:
ac NPL
ac Loans
IAPTER FOUR: TIME SERIES & PANELDATA __ DEBRE MARKOS UNIVERSI
=iIl. Draw time series line graph
Syntax: tsline var.
Q With zero trend line
tsline var.if e(sample)==1, yline(0)
Example: tsline NPL if e(sample)==1, yline(0)
Q You can draw time series line graph for
multiple variables at a time
Example: tsline NPL Deposit Loans if e(sample}==1, yline(0)
ok
GLICeli)
Il. Formal test
Dickey-Fuller (DF) test
O Suppose the following random walk model
Ve= Pes * Ur
HO: unit root (B=1)-non-stationary
Hi: unit root stationary
» The general essence behind the unit root test of
stationarity is, therefore, to find out if the estimated rho(p)
is statistically equal to one
Syntaxe: dfuller var
Examples:
ctier nr
ules Loane
Decision: if the absolute value of the Test statistics greater than the
absolute value intended significant level (usually 5%), we reject HO
© Implies that the data (variable) is stationary (no unit root).
+ In other word, if mackinnon p-value > the intended significant
level(usually 5%), we accept HO
=> Both cases show that NPL and Loans are not stationarity at level
Ca
UY.
TUL ce
CMICeli)
Q four data is stationary, the process can be finished here
O And we can make estimation, prediction... using OLS without any problem
> i.e. we can run this regression ‘reg NPL Loans’ and interpret the result
QO However, our data is not stationary, we have to take some remedial measures
before we run the regression.
Q Unless our result will be spurious!
Ca
UY. TUL ce
CMO)yy arb)
@ Atthough our interest isin stationary time series, one often encounters nonstationary time
sees, the classic example being the random walk model (RWM})
|. The following are the most common techniques used to transform nonstationary time series
‘A Detrending: Trend Stationary Processes (TSP)
1 Ifthe trond in a time saries is completely predictable and not variable, we call it a
deterministic rend it can be treated by detrending
|G This procedure of removing the (deterministic) rend is called detrending,
i. Subtract the mean of Y from Mt, the resulting series wil be stationary, hence the name trend
Example: forthe variable NPL
‘gen NPLbar = mean(NPL)
send
INPL-NPUbar)
‘nd forte variable Loans
gen Loansbar~ mean(loans)
fg2n dloans = (Loans - Loansbar) Why?
> Then check stationarity of the new variables ANPL and dloans
Thiet
‘CHAPTER FOUR: TIME SERIES & PANEL DATA
uv
=> The DF results show that both variables are not becoming
stationary by detrending.
SII)li, Regress the variable on time and the residuals from this regression
will then be stationary.
Example: for the variable NPL ;
reg NPL Year aalitae ae oy i
predict ui, residual
And for the variable Loans asereeeres seis x
reg Loans Year Be eeneiat eh Catttenl 100 ceiteal
predict ui2, residual cx ime a ai =m
Then check stationarity of the new error terms (ui and ui2) Saene ne eer ee
> If the error terms are stationary use ul and ui2 in the regression
instead of NPL and Loans
. = However, the DF results for both error terms show that they are nat
Syntax: reg ui ui2 becoming stationary again.
why?
1
The estimate of above mode! will not be spurious! "hans this is because the two variables are not TSs but DSPs
> Letus check if DSP!
Me Le ea OTB. Differencing: Difference Stationary Processes (DSP)
2. ifthe trend is not predictable, we callita stochastic trend =
D__Itcan be transform to stationary by taking the nth difference of the variable | ae
This procedure of transforming non-stationary data to stationary nth difference is
called differencing,
‘Syntaxes:
To generate 1* difference of the variable (e.g. NPL)
igen dINPL = d.NPL
To generate 1" diference of the variable (e.g. NPL)
gen diloans =d.loans
Then check stationarity of the new variable (44NPL) TL
Note: You loses one observation each cifferencing, Hence, take care for the sample Size and di ; 7
‘=> The DF result sows that both variables are stationary at first difference" eee
=> Implies that both variables are DSP. “% 3
‘CHAPTER FOUR: TIME SERIES & PANELDATA _ DEBRE MARKOS UNIVERSITY(DMU)Oif first difference of the variable is not
stationary, take the second difference and
so on.
To generate 2™ difference of the
variable (e.g. NPL)
gen d2NPL = d.d1NPL
Note: when you take more differences, you
loses more observation each differencing.
Hence, take care for the sample size and df.
TU Y. TUL ceIntegrated Stochastic Processes
Time series that can be made stationary by differencing is called integrated stochastic process.
o
Recall that the RWM without a drift is nonstationary but its first difference is stationary.
> Thus we call RWM without a drift integrated of order 1, denoted as y, ~ ! (1).
> Similarly, if a time series has to be differenced twice to make it stationary, such a time series is called
integrated of order 2, denoted as y, ~ I (2)
© In general, if a nonstationary time series has to be differenced d times to make it stationary, is
said to be integrated of order d, y; ~ I (d).
> Iftime series y, is stationary from the start, called integrated of order 0, y, ~ | (0)
Note: The terms ‘stationary time series’ and time series integrated of order zero’ to say the same thing.
Ca
UY. HAPTER FOUR:
ME SERIES & PANEL DATA
CMO)Properties of integrated series
Let x, y, &z; be three time series:
# Ifx, ~ 0) and y, ~ I(1), then 2, = (x, + y) is I(1).
—The sum of stationary and nonstationary time series is nonstationary.
“If x ~ 1d), then y; = (a + bx, ~ I(d); where a and b are constants.
—The linear combination of I(d) series is also I(d).
“If X, ~ 1(d,) and y, ~ 1(d,), then z, = (ax, + by,) ~ id), where d; > do.
* Ifx, ~ I(d) and y, ~ I(d), then z, = (ax, + by,) ~ I(d), where d’= d, but sometimes d'< d.
Example: If x, ~ /(7) and y, ~ I(1), then it can be that z, = (ax, + by,) ~ 1(0)
> The possibility to find stationary linear combinations of nonstationary time series is known as Cointegration!
TOT
TELL LeeCointegration
C1 We have seen that the regression of @ nonstationary time series on another nonstationary time series may
produce a spurious regression
O However, this may not always happen.
o
If the variables are integrated of the same order, sometimes regression of a unit root time series on another unit
root time series may give us non- spurious regression.
The variables are co-integrated!
i.e. error term will be stationary!
This makes estimation of the model using OLS possible!
> We have seen that NPL and Loans are stationary at 1* difference
Means that they are integrated of order 1, y, ~ / (1).
There may be a possibility of cointegration between the two variables
If so, the result of OLS regression ‘reg NPL Loans’ will not be spurious, though the two variables are not
stationary at level [not integrated of order 0, y; ~ | (0)]
Therefore, let us conduct the cointegration test
ok |APTER FOUR: TIME SE
ES & PANEL Di IIo)Q The existence of cointegration amongst sets of nonstationary time series has
three important implications.
1. The existence of dynamic long-run equilibria (co-movement r/p)
2. The long-run parameter estimates) converge to their population values
3. Allows for specification of both long-run and short-run dynamics.
TOT
TRE Le
=iCeli)
The Choice of appropriate Lag Length
Q Before conducting cointegration test, we need to
know the variables are integrated of the same order
In other word, we have to determining appropriate seiccrion»
lag length Sie aie San kiwas =
> Follow the following two-step commands eS SES
Syntaxes: 2 | chenist "Sloe f Sige 2:3ee3"
varbasic vars. “tioyenous! "eons
varsoc
Example:
eee eee Decision: The stars on each information criteria (IC) indicates
varsoc the appropriate lag length.
> For the above example, the appropriate lag length is 1.
Ca
UY. TUL eeTesting for Cointegration
QA number of methods for testing cointegration
have been proposed in the literature. We consider Ssric:Issr"= az
here two comparatively simple methods:
8
Johansen co-integration test
U Since the appropriate lag length is 1, let us conduct
the test at this lag
syntax Johansen co-integration test
Ho: no co-integration
Example Hi: There is cocintegration
vecrank NPL Loans , lag (1)
Decision: If the trace statistics greater than the critical value,
there is co-integration (reject Ho)
=» Hence, there is co-integration between NPL and loans!
Me Le ea OTQ The result of Johansen co-integration test shows co-integration
relationship
Q This implies that the result of OLS regression will not be spurious!
> So we can estimate the OLS result of the model which shows the long-
run r/p between variables. ——
reg NPL Loans
ok ‘CHAPTER FOUR: TIME SERIES & PANELDATA __ DEBRE MARKOS
TSOError Correction Model (ECM)
O We just showed that NPL and Loans are cointegrated. mea Ae atoms Lk
> There is a long-term, or equilibrium, relationship between the | _* _* =. ma
two. Somes 0 @ iemiind baat
> However, there may be disequilibrium in the short run mi | cannes Lesa
Q The SR r/p is estimated using ECM am | sae 8 bit
O The ECM for the above model specified as Saat acer arta ales etatel nara
ANPL = Bo + yALoans + Oresidual,_, + & Why? .
steps: Fehr el | tc Aes
‘Step 1: Estimate the model and obtain the residual ————
Stop 2: Estimate the model using © iference of the vais ad 1 gf the resin
Syntaxes:
reg d.NPL d.Loans Lui =» The coefficient of the residual is si
takes more than one period(1/0.556= 1.79 =2 years) to
correct the deviation from the LR equi
Me Le ea OTCAM i ceele ase M Tia taeda mel]
‘What is Panel Data a ee
So far, we have seen regression analysis using cross-sectional and
time series data separately.
Q However, these two types of data can come together
Q such kind of data called panel or pooled data
© pooling of time series and cross-sectional observations),
» Panel data (also known as longitudinal or cross-sectional time-
series data)
> A panel data set where there are repeated observations (set of
entities that remain the same) through time.
Note: panel (longitudinal) data and pooled data are not exactly the
same thing.
‘CHAPTER FOUR: TIME SERIES & PANELDATA _ DEBRE MARKOS UNIVERSITY(DMU)Celi)
‘Types of Panel data
Short panel: many individuals and few time periods
Long panel: many time periods and few individuals
Both: many time periods and many individuals
Balanced panel: when all individuals are observed in all time periods
Unbalanced panel: when individuals are not observed in all time periods
Ca
UY. HAPTER FOUR:
TELL LeeCeli)
Regressors
¢ Varying regressors Xi.
o annual income for a person, annual consumption of a product
¢ Time-invariant regressors Xz = x; for all t.
o gender, race, education
¢ Individual-invariant regressors xj, = x; for all i.
© time trend, economy trends such as unemployment rate
ok ‘CHAPTER FOUR: TIME SERIES & PANELDATA __ DEBRE MARIVariation for the dependent variable and regressors
+ Between variation: variation between individuals
“ Within variation: variation within individuals (over time)
Overall variation: variation over time and individuals
Me Le ea OTG Panel data models have the following general form:
Vie =o + eaxsic + Hie
Where;
stands for the it cross-sectional unit in the time period t
X is the regressors
y is the usual dependent variable
uj is the error term assumed to follow the classical assumptions of zero mean and constant variance.
For example, suppose you want to estimate the effect of capital stock on RGDP in Ethiopia and Kenya within
1972 and 2021
Then the model becomes
RGDPi_ = cry + eyCapStockie + Hie
Where;
i stands for the either Ethiopia or Kenya in the time period t
|APTER FOUR: TIME SE
EVIL UneLike that of time series data, the first thing we need to do is
declaring the data as Panel data before conducting any estimation.
2 To declare the data as panel data use the following commands aaeHGes
> Toset cross-sectional version of the data (id) eg, to set Country_Code ome
asid tine variable: Year,
delta 1
Syntax: global 1D id
> to set Time series version of the data (id) eg, to set Year as year
Syntax: global Year year
> finally decleare the data as panel
‘Syntax: xtset ID Year
This shows that our data (model) is balanced panel.
Which is the same as saying you have no missing observations.
Me Le ea OT‘Suppose the following simple hypothetical data:
me omen oes [pce ene
io ar ete tot
Overall vatiances$ = Yet ~ 9
cA ille-20)+(10-20)+(11-20)'+3(20-20)?+/25-20}+(30-20)4(35-20)°
=1/8(652)=81.5 >V81: 1.0277
Within variance si = => Di Le(%ie — 2)?
= he (0-10? + (10-10? (11-107 +320-207+25.30)°+(30-
30)?+(35-30)°] =1/8(52)~6.5 -V6B= 2.549 a
Between variance DiC — ¥)? VT0 = 10 ans ay ee 1° 3
‘Syntax: xtsum id year Cons
‘CHAPTER FOUR: TIME SERIES & PANELDATA _ DEBRE MARKOS UNIVERSITY(DMU)Celi)
O Let us go back to our data we are working on and have some
overlook
‘Syntax: xtsum Year ID RGDP Capitalstock
GARGDP has more within variation ($728140) than between stm vor 10 ncoP capitate
variation ($51230.31) vesaie ree ee
Q The average RGDP of an individual is between $242726.3 and as
$315176.9 across individuals
‘Note: Time-invariant variables like individual ID has zero within sr
variation. es ee
C And also, individual-invariant regressors such as time (year)
have zero between variation,
Why?
> ID has zero within variation because it doesn’t change (vary)
with time.
> Year has zero between variation because it doesn’t change
(vary) between the two groups (Ethiopia and Kenya)
Ca
UY. HAPTER FOUR:
TELL LeeCeli)
Estimation of panel data regression model
QEstimation of panel data model depends on the assumptions we make about the
intercept, the slope coefficients, and the error term, uit.
Based on this assumption, there are three types of models:
© The pooled model (pooled OLS)
© The fixed effects model
© The random effects model
Ca
UY. HAPTER FOUR:
TELL LeePooled OLS
Assumption: All Coefficients Constant across Time and
Individuals
O Itis the simplest, and possibly naive, approach as it disregard
the space and time dimensions of the pooled data and just velit | SMEG SHOES Eee ER
estimate the usual OLS regression.
The problem of this model is that it doesn’t distinguish —] Foe
between cross-sections (countries)
And by combining (pooling) these countries, deny the
heterogeneity or individually that may exist between
countries
> This is the most restrictive panel data model and is not used
smuch in the literature.
Me Le ea OTChoosing between fixed and random effects
Q To identify which model to be used, we can conduct
Hausman test
> Run the following commands simultaneously
quietly xtreg RGDP Capitalstock fe B20) + eer)
estimate store fixed
quietly xtreg RGDP Capitalstock re i) Balisman tet
Halal dade Ho: random effect model is consistent
estimate store random
H1: fixed effect model is consistent
hausman fixed random
Decision: if p-vale is significant (p-valuecthe critical value usually 10%), reject
HO. This implies that better to use fixed effect model
> For the above example, the p-value (0.0044) is less than the intended
level of significant.
So, we have to reject Ho
Implies that fixed effect model is consistent for our data
SII)Fixed Effect Model
C2 Use the following command to run fixed effect mode!
Syntax: xtreg RGDP Capitalstock fe
core (ud, Xb) ~ -0.0689 thows the correlation of the errors ui with the" eye
rogressoss. 5
De Raccmatic Ei rel cote ss my = 0089 reese Mites
O The value of rho sows that 2.6% of the variance is (variation in
RGDP) due to differences across panels.
Where; : re tcsitan tena amin
thos ae Fier been Fa a = err
sigma_u = sd of residuals within groups u,
sigma_e = sd of residuals (overall error term) e,
SS SSS ESE SSS rcCelis
Random effect model
O Use the following command to run random effect model
Syntax: xtreg RGDP Capitalstock ,re
se 2)=€ ssa shows ne creation of the ers ul wih the regressors, rf, ees)
> Because tis RE mode wa - mime | tt
O The value of sigma_u (the residual within the group) is Pa
because it is RE model considers variation between the group.
Me Le ea OT