Quick Review about How to Use SAS to
Analyze Time Series Data
1. Get to know SAS
How to Start SAS?
)f you use computer in this laboratory, please start SAS from Desktop or Start/programs .
You can use the SAS software at the laboratory of the Computer center of our university, or even
by the server of our university if you have the permission.
You can get a temporary license of the SAS software by contacting our computer assistant.
Five main windows
Program Editor -- Edit SAS programs
Log Records the running messages of SAS session, which is very helpful for program
debugging.
Output Display output from SAS procedures
Explorer Manage SAS datasets or Create new libraries
Result Show a tree-like summary of your Output window
Several important shortcuts
Open a new Program Editor window
Open SAS program which is composed before
Save your program as external files
Create a new library
Open Explorer window to manage SAS datasets
Submit the whole program or just submit a few lines SAS programs to SAS System
2. How to use SAS
Two important concepts
SAS library A folder in which the SAS data set is. You can create a new library by libname or
shortcut
SAS data set Temporary and Permanent SAS data set.
Structure of SAS program
DATA step Deal with SAS dataset, or change raw data into a SAS data set, which can be
identified by SAS System and dealt with by PROC step
=====================================
DATA dataset name;
INPUT variable<format>;
CARDS;
.. data line
=====================================
The dataset name must contain no more than 8 characters alphabet a, b , digit
underscore (_)), and begin with alphabet or underscore.
PROC step Deal with SAS data set, and output results of analysis
=====================================
PROC procedure name DATA= dataset name;
RUN;
=====================================
, or
The procedure name is the name of SAS Command, and includes PRINT, PLOT, GPLOT, and
INSIGHT etc.
3. Change raw data into SAS dataset
Create a new library
Library Name
Lib1
Physical Path
D:\example
Using SAS program.
Using shortcut.
Libname lib D:\example
SAS data set name
library_name.dataset_name
For example, lib1.blood means that data set blood is saved in the library lib1.
The library_name can be sashelp, sasuser, maps, work or lib1. The dataset_name is due to you,
such as blood.
When library_name is equal to work, the data set work.dataset_name is temporary SAS data set,
which will be deleted automatically when you shut down the SAS software. At this time, the
work can be ignored. For example, you use blood or work.blood as the name of the data set.
Three methods to deal with data through DATA Step
The size of raw data is small.
DATA dataset name;
INPUT variable <format>;
CARDS;
. data line)
;
The data are saved in some file.
DATA dataset name;
INFILE physical path ;
INPUT variable <format>;
RUN;
The data that you want to deal with are also SAS data set.
DATA dataset name;
SET dataset name that you want to deal with;
RUN;
4. SAS Application without programming
SAS/INSIGHT
How to Start SAS/INSIGHT?
o PROC INSIGHT DATA=dataset name; RUN;
o
Solutions --- Analysis --- )nteractive Data Analysis
It can be used to draw Several Types Graph such as Line Plot, Scatter Plot, Rotating Plot, 3dimensions Scatter Plot Matrix, etc.
It can be used to do some simple statistical analysis.
5. How to use SAS in time series analysis
Time Series Forecasting System (without programming)
Solutions Analysis Time series forecasting system
Using SAS procedure
AR)MA and AUTOREG procedures.
6. Some commonly used options in the ARIMA procedure
Syntax
PROC ARIMA options;
BY variables;
IDENTIFY VAR=variable options;
ESTIMATE options;
OUTLIER options;
FORECAST options;
RUN;
QUIT;
BY
A BY statement can be used in the ARIMA procedure to process a data set in groups of
observations defined by the BY variables. Note that all IDENTIFY, ESTIMATE, and FORECAST
statements specified are applied to all BY groups.
IDENTIFY
ALPHA= significance-level: The ALPHA= option specifies the significance level for tests in the
IDENTIFY statement. The default is 0.05.
ESACF: computes the extended sample autocorrelation function and uses these estimates to
tentatively identify the autoregressive and moving average orders of mixed models.
The ESACF option generates two tables. The first table displays extended sample
autocorrelation estimates, and the second table displays probability values that can be used to
test the significance of these estimates. The P= (pmin: pmax) and Q= (qmin: qmax) options
determine the size of the table.
NLAG= number: indicates the number of lags to consider in computing the autocorrelations and
cross-correlations.
STATIONARITY=(ADF= AR orders DLAG= s) or STATIONARITY=(DICKEY= AR orders DLAG= s):
performs augmented Dickey-Fuller tests. If the DLAG=s option specified with s is greater than
one, seasonal Dickey-Fuller tests are performed. The maximum allowable value of s is 12. The
default value of s is one.
VAR= variable ( d1, d2, ..., dk ) : names the variable containing the time series to analyze. The
VAR= option is required. A list of differencing lags can be placed in parentheses after the
variable name to request that the series be differenced at these lags. For example, VAR=X(1)
takes the first differences of X. VAR=X(1,1) requests that X be differenced twice, both times with
lag 1, producing a second difference series, which is (Xt-Xt-1)-(Xt-1-Xt-2)=Xt-2Xt-1+Xt-2 .
VAR=X(2) differences X once at lag two (Xt-Xt-2) . If differencing is specified, it is the
differenced series that is processed by any subsequent ESTIMATE statement.
ESTIMATE
METHOD=ML/ULS /CLS: specifies the estimation method to use. METHOD=ML specifies the
maximum likelihood method. METHOD=ULS specifies the unconditional least-squares method.
METHOD=CLS specifies the conditional least-squares method. METHOD=CLS is the default.
P= order: specifies the autoregressive part of the model. By default, no autoregressive
parameters are fit. P=(l1, l2, ..., lk) defines a model with autoregressive parameters at the
specified lags. P= order is equivalent to P=(1, 2, ..., order). A concatenation of parenthesized lists
specifies a factored model. For example, P=(1,2,5)(6,12) specifies the autoregressive model
Q= order: specifies the moving average part of the model.
NOCONSTANT/NOINT: suppresses the fitting of a constant (or intercept) parameter in the
model. (That is, the parameter is omitted.)
PLOT: plots the residual autocorrelation functions. The sample autocorrelation, the sample
inverse autocorrelation, and the sample partial autocorrelation functions of the model residuals
are plotted.
FORECAST
ALPHA= n: sets the size of the forecast confidence limits. The ALPHA= value must be between 0
and 1. When you specify ALPHA=, the upper and lower confidence limits will have a confidence
level. The default is ALPHA=.05, which produces 95% confidence intervals. ALPHA values are
rounded to the nearest hundredth.
ID= variable: names a variable in the input data set that identifies the time periods associated
with the observations.
INTERVAL= interval /n: specifies the time interval between observations.
LEAD= n: specifies the number of multistep forecast values to compute.
OUT= SAS-data-set: writes the forecast (and other values) to an output data set.
Fitting the ARIMA Model to a Simulated Time Series
0. Simulate an AR(2) time series data
The model: Z(t)=0.5*Z(t-1)+0.4Z(t-2)+a(t)
The SAS program:
/* Create a new library */
libname ts 'D:/TimeSeries';
/* Simulate an AR(2) process */
data ts.ar;
z1=0; z2=0;
do t = -50 to 200;
a = rannor( 32565 );
z = z1*0.5 + z2*0.4 + a;
if t > 0 then output;
z2=z1; z1=z;
end;
keep z t;
run;
Simulate an MA(2):
/* Simulate an MA(2) process */
data ts.ma;
a1=0; a2=0;
do t = -50 to 200;
a = rannor( 32565 );
z = a + a1*0.2+a2*0.5;
if t > 0 then output;
a2=a1; a1=a;
end;
keep z t;
run;
Simulate an ARMA(1,1):
/* Simulate an ARMA(1,1) process */
data ts.arma;
z1=0; a1=0;
do t = -50 to 200;
a = rannor( 32565 );
z = z1*0.5 + a + a1*0.3;
if t > 0 then output;
a1=a; z1=z;
end;
keep z t;
run;
1. Draw the time plot
The SAS program:
/* Draw the time plot */
symbol i=join v=none;
proc gplot data=ts.ar;
plot z*t;
run;
quit;
The result:
Simulated AR(2) Time Series
2. Identify some suitable models
The SAS program:
/* Identify some suitable models with minimum requirement */
proc arima data=ts.ar;
identify alpha=0.05 var=z nlag=20;
run;
/* Use EACF to identify the orders of ARMA models */
identify alpha=0.05 var=z nlag=20 esacf p=(0:6) q=(0:8);
run;
/* Use Dickey-Fuller unit root tests to check the stationarity */
identify alpha=0.05 var=z nlag=20 stationarity=(dickey=(1, 2, 4));
run;
/* Take differencing on the data and analyze again */
identify alpha=0.05 var=z(1) nlag=20 stationarity=(dickey=5);
run;
quit;
The summary of the output:
The detailed output without differencing:
Series Correlation Panel
different values of k
3 different tests
3 deterministic trends
The detailed output after first differencing:
Series Correlation Panel
We may reach three possible models:
ARIMA(3,0,0); ARIMA(0,1,1); and ARIMA(2,1,0).
3. Estimate the models
Candidate models: AR(3), ARMA(3,1) with AR coefficient at lag 2 suppressed and ARIMA(2,1,0)
without intercept.
The SAS program:
/* Identify some suitable models with minimum requirement */
proc arima data=ts.ar;
identify alpha=0.05 var=z nlag=20;
run;
/* Use EACF to identify the orders of ARMA models */
identify alpha=0.05 var=z nlag=20 esacf p=(0:6) q=(0:8);
run;
/* Use Dickey-Fuller unit root tests to check the stationarity */
identify alpha=0.05 var=z nlag=20 stationarity=(dickey=(1, 2, 4));
run;
/* Take diffferencing on the data and analyze again */
identify alpha=0.05 var=z(1) nlag=20 stationarity=(dickey=5);
run;
/* Use CLS method to estimate the AR(3) model */
identify var=z;
run;
estimate method=cls p=3 plot;
run;
/* Use ULS method to estimate the ARMA(3,1) model */
/* with the second coefficient is suppressed */
estimate method=uls p=(1,3) q=1 plot;
run;
/* Use ML method to estimate the ARIMA(2,1,0) model without
intercept */
identify var=z(1);
run;
estimate method=ml p=2 noint plot;
run;
quit;
The summary of the output:
The estimated AR(3) model:
The important outputs for the fitted AR(3) model:
Estimated
parameters
Mean
Intercep
Variance of the
white noise
Standard deviation
of the white noise
P values of
significance
Outputs for ARMA(3,1) with AR coefficient at lag 2 suppressed:
Outputs for ARIMA(2,1,0) without intercept:
4. Diagnostic checking for the fitted ARIMA(2,1,0)
The SAS program:
/* Diagnostic checking for the fitted ARIMA(2,1,0) */
proc arima data=ts.ar;
identify var=z(1);
run;
estimate method=ml p=2 noint plot;
run;
forecast out=ts.dc lead=0 id=t;
run;
quit;
/* Draw the time plot */
symbol i=join v=none;
proc gplot data=ts.dc;
plot residual*t;
run;
quit;
/* Perform the normality test */
proc univariate data=ts.dc normal plot;
var residual;
run;
The summary of the output:
The time plot:
A normality test:
Distribution plot and Q-Q plot for normality:
Sample autocorrelation function (ACF) of the residuals and Sample partial ACF of the residuals:
Ljung-Box test:
Test statistic
Degree of
freedom
P-values
Analysis of over-parameterized models:
o The SAS program:
/* Analysis of over-parameterized models */
proc arima data=ts.ar;
identify var=z(1) nlag=20;
run;
estimate method=ml p=2 noint plot;
run;
estimate method=ml p=(1,2)(6) noint plot;
run;
estimate method=ml p=2 q=(6) noint plot;
run;
quit;
o The first over-parameterized model based on the sample partial ACF:
o The second over-parameterized model based on the sample ACF:
o Three fitted models:
o Conclusion is that the fitted ARIMA(2,1,0) is not adequate!
5. Do forecasting with the fitted ARIMA(2,1,0) model
The SAS program:
/* Do forecasting by using the fitted ARIMA(2,1,0) model */
proc arima data=ts.ar;
identify var=z(1) nlag=20;
run;
estimate method=ml p=2 noint plot;
run;
forecast out=ts.out lead=50 id=t;
run;
quit;
/* Draw the time plot */
symbol i=join v=none;
proc gplot data=ts.out;
plot z*t=1 forecast*t=2 l95*t=3 u95*t=3/overlay;
run;
quit;
The results:
Fitting the Seasonal ARIMA Model to
The Airline Passenger Data
0. The data
The airline passenger data records the number of passengers traveling by air per month from
January, 1949 to December, 1960.
It is given as Series G in Box and Jenkins (1976), and has been used in time series analysis
literature as a standard example of a non-stationary seasonal time series.
1. Draw the time plot
The SAS program:
/* Create a new library */
libname ts 'D:/TimeSeries';
/* Draw the time plot */
symbol i=join v=none;
proc gplot data=sashelp.air;
plot air*date;
run;
quit;
The time plot:
Taking log transformation and drawing the time plot again.
/* Take log transformation*/
data ts.lair;
set sashelp.air;
lair=log(air);
run;
/* Draw the time plot */
symbol i=join v=none;
proc gplot data=ts.lair;
plot lair*date;
run;
quit;
The time plot:
2. Identify some suitable models
The SAS program:
/* Identify some suitable models*/
proc arima data=ts.lair;
identify alpha=0.05 var=lair;
run;
/* Take differencing since the sample ACF decays slowly */
identify alpha=0.05 var=lair(1);
run;
/* Take seasonal differencing since the sample ACF decays slowly
especially after periods */
identify alpha=0.05 var=lair(1,12);
run;
The sample ACF of original sequence:
The sample ACF of the sequence after common differencing:
The sample ACF of the sequence after both common differencing and seasonal differencing:
3. Estimate the seasonal ARIMA(0,1,1)X(0,1,1)12 model
The SAS program:
proc arima data=ts.lair;
identify alpha=0.05 var=lair(1,12);
run;
/* Estimate the ARIMA(0,1,1)X(0,1,1)12 model to the data */
estimate method=ml q=(1)(12) plot;
run;
The estimated model:
4. Diagnostic checking the fitted seasonal ARIMA(0,1,1)X(0,1,1)12 model
The SAS program:
proc arima data=ts.lair;
identify alpha=0.05 var=lair(1,12);
run;
/* Estimate the ARIMA(0,1,1)X(0,1,1)12 model to the data */
estimate method=ml q=(1)(12) plot;
run;
/* Diagnostic checking by overfit AR part */
estimate method=ml p=(9) q=(1)(12) plot;
run;
/* Diagnostic checking by overfit MA part */
estimate method=ml q=(1)(12)(23) plot;
run;
/* Export the data to do further diagnostic checking*/
forecast out=ts.out lead=0 id=date;
run;
quit;
/* Draw the time plot */
symbol i=join v=none;
proc gplot data=ts.out;
plot residual*date;
run;
quit;
/* Perform the normality test */
proc univariate data=ts.out normal plot;
var residual;
run;
The sample ACF of residuals:
The sample PACF of residuals:
Ljung-Box test:
Diagnostic checking by overfitting the AR part and the MA part:
Compare the
estimated
coefficients
Compare
the model
criteria
The time plot of the residuals:
Normality tests:
Distribution plot and Q-Q plot for normality:
5. Do forecasting with the fitted seasonal ARIMA(0,1,1)X(0,1,1)12 model
The SAS program:
/* Do forecasting with the fitted seasonal ARIMA(0,1,1)X(0,1,1)12 model */
proc arima data=ts.lair;
identify alpha=0.05 var=lair(1,12);
run;
estimate method=ml q=(1)(12) plot;
run;
forecast out=ts.out lead=24 id=date interval=month;
run;
quit;
/* Draw the time plot */
symbol i=join v=none;
proc gplot data=ts.out;
plot lair*date=1 forecast*date=2 l95*date=3 u95*date=3/overlay;
run;
quit;
10
The result: