[go: up one dir, main page]

0% found this document useful (0 votes)
75 views17 pages

Project 5 - Gas

Download as docx, pdf, or txt
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 17

Time Series

Forecasting- Gas

Dr. Areej Aftab Siddiqui Siddiqui


PGP-BABI (June 2019)

0
Contents
1. PROJECT OBJECTIVE.......................................................................................................................2
2. EXPLORATORY DATA ANALYSIS – STEP BY STEP APPROACH..........................................................2
3. Exploratory Data Analysis..............................................................................................................2
3.1 Analysis..................................................................................................................................2
3.2 Additive and Multiplicative model.........................................................................................4
4. Stationarity....................................................................................................................................7
5. Train and Test................................................................................................................................8
6. Forecasting....................................................................................................................................8
6.1 Simple Forecasting.....................................................................................................................8
6.2 Exponential Forecasting on Gas..................................................................................................9
6.3 Holts method...............................................................................................................................9
6.4 Removing seasonality from the time series and testing Holt's efficacy again...........................10
6.6 Holt winters...............................................................................................................................11
6.7 Holt winters multiplicative model..............................................................................................12
7. ARIMA and Auto ARIMA..............................................................................................................13
8. Model Accuracy...........................................................................................................................15
8.1 Ljung box test............................................................................................................................15
8.2 Box-Pierce test...........................................................................................................................15
8.3 Accuracy of the forecast............................................................................................................15

1
1. PROJECT OBJECTIVE

The objective of the report is to explore the datasets on the Gas and to perform Principal Component
Analysis and Regression on the variables. The Case Study will be solved using R programming. The
report will mainly consist of the following:
i. Importing the dataset in R
ii. Examining the structure of the Gas dataset
iii. Graphically depicting the Gas dataset
iv. Examining the presence of stationarity
v. Forecast for 20 periods
vi. Check accuracy

2. EXPLORATORY DATA ANALYSIS – STEP BY STEP APPROACH


A Typical Data exploration activity consists of the following steps:
1. Environment Set up and Data Import
2. Variable Identification
3. Analysis
4. Missing Value Treatment
5. Forecasting
6. Accuracy Check

We shall follow these steps in exploring the provided dataset.

Environment Set Up and Data Import


Install necessary Packages and Invoke Libraries
The packages installed and libraries invoked for the current project are tidyverse, Metrics,
dplyr, vif, car, corrplot
Set up working Directory
The working directory is set to import and export data files. For the current project the
working directory was set in Documents in C Drive.
Import and Read the Dataset
The given dataset is in .xlsx format. Hence, the command ‘read.xlsx is used for importing the
file.

Please refer Appendix A for Source Code.

3. Exploratory Data Analysis


3.1 Analysis
The frequency of the dataset is 12 (indicating months).
Table-1
Summary
Min. 1st Qu. Median Mean 3rd Qu. Max.
1646 2287 3345 9425 15996 43558

2
There is no missing value. The dataset starts from January 1956 and ends on August 1995. Summary
output of the data indicates that the dataset might be skewed.
The Monthly, Quarterly and Annual Gas trends indicate that Gas consumption rises after 1980 with
high rate of growth after 1985.

On observing the Seasonality, it is seen that Gas consumption rises from April, is maximum in July
and then decreases.

The dataset has outliers in the month of March, May, July and August.

3
Boxplot

It is important to adjust the time series as each month has different number of days. The adjusted
series trend is as follows:

3.2 Additive and Multiplicative model


In an additive model, the amplitude of both the seasonal and irregular variations do not vary as the
level of the trend rises or falls. Decomposition is a tool that can separate different components in time
series.
The decomposition of additive series indicates, rising trend with similar seasonality each year.

4
Similar trend is observed in case of Multiplicative time series

5
Loess ("locally-weighted scatterplot smoothing") uses local regression. A window of a specified
width is placed over the data. The wider the window, the smoother the resulting loess curve. If you
think the seasonal pattern is constant through time, you should set this parameter to a big value, so
that you use the entire data to perform your analysis.

6
Summary:
Seasonality is strong but consistent.
Trend - 1950 to 1970, there was a strong growth and after that, the noises went up started from
1990 as well.
Additive and Multiplicative gives same results. Hence, the dataset (time series) is additive.
As the time series increases in magnitude, the seasonal variation increases as well.

4. Stationarity

Stationarity is important in time series forecasting. It is seen through KPSS and ADF that time series
is not stationary.

KPSS Test for Level Stationarity

data: gas.ts
KPSS Level = 4.3406, Truncation lag parameter = 5, p-value = 0.01

Augmented Dickey-Fuller Test

data: gas.ts
Dickey-Fuller = 0.73972, Lag order = 6, p-value = 0.99
alternative hypothesis: stationary

7
ACF and PACF plots ( another method to observe stationarity)

It is seen that the dataset is not stationary.

5. Train and Test


Train data is taken from 1970 January to 1993 December and test data for 1994 only. Simple moving
average is used to visualise time series.

6. Forecasting
6.1 Simple Forecasting
On plotting the forecast it is seen that for 20 periods,

For 12 periods

8
Accuracy:
ME RMSE MAE MPE MAPE ACF1 Theil's U
Test set 2153.958 4904.692 3809.181 5.203979 12.32112 0.703002 1.638023
s
The forecasts indicate rising trend in the next 12 periods.

6.2 Exponential Forecasting on Gas


Smoothing parameters:
alpha = 0.2

Initial states:
l = 1928.6188

sigma: 1702.584

AIC AICc BIC


5918.310 5918.352 5925.635

ME RMSE MAE MPE MAPE MASE ACF1


Training set 395.33 1696.662 1061.25 1.618 14.300 1.079 0.804
Test set 4131.69 6400.524 5097.60 11.789 16.201 5.18 0.7365

9
Theil's U
Training set NA
Test set 2.114935
The forecasts indicate stagnant trend in the next 12 periods.

6.3 Holts method

Smoothing parameters:
alpha = 0.9642
beta = 0.7143

Initial states:
l = 1818.4411
b = 74.9327

sigma: 951.2777

AIC AICc BIC


5587.001 5587.214 5605.316
ME RMSE MAE MPE MAPE
Training set -8.404 944.648 575.2032 0.4822 7.538
Test set 17680.00 19818.83 17680.000 58.4034 58.40
MASE ACF1 Theil's U
Training set 0.5849 -0.00282 NA
Test set 17.979 0.777285 6.928428
The forecasts indicate falling trend in the next 12 periods.

10
6.4 Removing seasonality from the time series and testing Holt's efficacy again

ME RMSE MAE MPE MAPE MASE


Training set -6.033996 755.8903 558.1407 4.162145 16.48117 0.5676028
Test set 11335.864070 12711.9230 11335.8641 37.619269 37.61927 11.5280398
ACF1 Theil's U
Training set 0.03661692 NA
Test set 0.77404723 6.231689
The forecasts indicate falling trend in the next 12 periods after removing seasonality.

11
6.6 Holt winters

Smoothing parameters:
alpha = 0.2596
beta = 0.0202
gamma = 0.7404
phi = 0.98

Initial states:
l = 2230.6766
b = 7.2177
s = -1034.848 -532.6302 71.5714 516.9939 1504.973 1781.418
1320.311 588.8141 -554.9172 -747.7943 -1420.407 -1493.485

sigma: 605.0447

AIC AICc BIC


5338.864 5341.407 5404.798

Training set error measures


ME RMSE MAE MPE MAPE MASE ACF1
Training set 81.59 586.91 387.415 0.9087 7.175 0.3939 0.4235

Jan Feb Mar Apr May Jun Jul Aug


1994 21010.61 21398.96 23621.92 25059.73 27968.26 29980.18 33316.35 32392.95
Sep Oct Nov Dec
1994 28465.48 27323.58 25045.88 23600.47

accuracy(gashw.f1, gas.test)
ME RMSE MAE MPE MAPE MASE
Training set 81.59189 586.9159 387.415 0.9087759 7.175919 0.3939828
Test set 2233.13630 2602.5705 2233.136 7.2892297 7.289230 2.2709944
ACF1 Theil's U
Training set 0.4235979 NA
Test set 0.4241718 0.9189364
The forecasts indicate seasonal trend in the next 12 periods.

12
6.7 Holt winters multiplicative model

Smoothing parameters:
alpha = 0.792
beta = 0.051
gamma = 2e-04
phi = 0.9797

Initial states:
l = 2034.7689
b = 3.6615
s = 0.8541 0.917 1.0114 1.0758 1.218 1.2715
1.1749 1.0929 0.9184 0.8838 0.7932 0.789

sigma: 0.0454

AIC AICc BIC


4773.207 4775.750 4839.140

Training set error measures:


ME RMSE MAE MPE MAPE MASE
Training set 27.94532 434.7321 253.4229 0.3438495 3.183346 0.2577191
ACF1
Training set -0.07376761
ME RMSE MAE MPE MAPE MASE
Training set 27.94532 434.7321 253.4229 0.3438495 3.183346 0.2577191
Test set 2197.42260 2341.3448 2197.4226 7.5092231 7.509223 2.2346753
ACF1 Theil's U
Training set -0.07376761 NA
Test set 0.07767108 0.8533396
The forecasts indicate seasonal trend in the next 12 periods.
Holt Winters multiplicative model is most accurate considering MAPE, MAE and RMSE.

7. ARIMA and Auto ARIMA

arima(x = gas.train, order = c(1, 1, 1))

Coefficients:
ar1 ma1
0.5134 -0.0584
s.e. 0.0850 0.0893

sigma^2 estimated as 728709: log likelihood = -2344.47, aic = 4694.94

13
Fitting with Auto ARIMA
ARIMA(1,1,0)

Coefficients:
ar1
0.4668
s.e. 0.0522

sigma^2 estimated as 732286: log likelihood=-2344.67


AIC=4693.34 AICc=4693.38 BIC=4700.66

14
Auto ARIMA also fits the same p and q parameters for the model, but has a slightly lower AIC.

15
8. Model Accuracy
8.1 Ljung box test
H0: Residuals are independent
Ha: Residuals are not independent
Box-Pierce test

data: gasarima$residuals
X-squared = 0.012044, df = 1, p-value = 0.9126

8.2 Box-Pierce test


data: fit$residuals
X-squared = 0.16217, df = 1, p-value = 0.6872
Residuals are independent

8.3 Accuracy of the forecast


f7=forecast(gasarima)
> accuracy(f7, gas.test)
ME RMSE MAE MPE MAPE MASE
Training set 33.71582 852.161 521.0072 0.4181353 6.813444 0.5298398
Test set 8224.60366 9645.597 8224.6037 26.3034427 26.303443 8.3640345
ACF1 Theil's U
Training set -0.006466775 NA
Test set 0.733738029 3.308731
f8=forecast(fit)
> accuracy(f8, gas.test)
ME RMSE MAE MPE MAPE MASE
Training set 35.20191 852.7607 522.7044 0.4097849 6.844272 0.5315658
Test set 8036.47583 9464.9462 8036.4758 25.6577831 25.657783 8.1727174
ACF1 Theil's U
Training set -0.02372956 NA
Test set 0.73383680 3.24253

The ARIMA model forecasts gas consumption accurately.

16

You might also like