[go: up one dir, main page]

0% found this document useful (0 votes)
27 views32 pages

Business Forecasting Using R

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1/ 32

Business Analytics

Module 5:
Introductory Time
Series Analysis using R
Agenda

In this session, you will learn about:

• Basic Time Series and it’s components


• Moving Averages (Simple and Exponential)
• R’s inbuilt function ts()
• Plotting of time series
• Business Forecasting using moving average
methods
• The ARIMA model
• Application of ARIMA model in Business

Private and Confidential 2


Basics of Time Series

Private and Confidential 3


What is a Time Series

• A time series is a collections of data points against various time periods


• E.g. Share price data for last, say, two years
• Share price data is comprised of Open, High, Low and Close
• Below is the plot of Infosys share price data (close price) for last 2 years

Close Price - Infosys


4500

4000

3500

3000

2500

2000

1500
5 5 5 5 5 5 5 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 3 3 3 3 3 3 3 3 3 3
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
ay Apr Apr ar Feb -Jan -Jan Dec Nov Oct Oct Sep Aug -Jul -Jul Jun ay ay Apr ar Feb -Jan -Jan Dec Nov Nov Oct Sep Aug Aug -Jul Jun Jun
-M 7- 1- 1-M 8- 27 5 2- 1- 29- 1- 0- 9- 25 4 13- 3-M 2-M 4- 4-M 0- 30 9 8- 7- 5- 14- 0- 9- 6- 16 25- 4-
19 2 1 1 1 2 1 1 2 1 2 1 2 2 2

Private and Confidential 4


Examples of Time Series and it’s importance

• Time series can be found virtually everywhere


• Apart from share price (as shown in the last slide) we can observe
time series as
– Quarterly Sales data of a Company
– Monthly inflation number (i.e. CPI / WPI) as released by MOSPI
– Yearly employment data,
– Your consumption of foods over a period of, say last 5 years etc.
• Importance of analyzing Time Series
– To understand a pattern of behavior (i.e. human’s consumption of fast
foods, mean reverting or random walk)
– To identify a trend
– To forecast from that trend (predictive modeling)

Private and Confidential 5


Components of a time series

• Any time series can be divided into three parts


– A trend component
– A seasonal component and
– A random component
• Trend component
– A common method for obtaining the trend component is to use linear
filters on a given time series

– A simple class of linear filters are moving averages with equal weight

Private and Confidential 6


Components of a time series

Water Consumer in London, UK From 1983 to 1994


45000 Random or irregular
Trend component: Three component
month’s moving average is
40000 calculated as
(M_1 + M_2 + M_3) / 3
35000

30000

25000

20000
83 83 83 84 8 4 84 85 85 8 5 86 86 86 87 87 8 7 88 8 8 88 89 89 8 9 90 90 9 0 91 9 1 91 92 92 9 2 93 93 9 3 94
n- y- p- n- y- p- n- y- p- n- y- p- n- y- p- n- y- p- n- y- p- n- y- p- n- y- p- n- y- p- n- y- p- n-
Ja Ma Se Ja Ma Se Ja Ma Se Ja Ma Se Ja Ma Se Ja Ma Se Ja Ma Se Ja Ma Se Ja Ma Se Ja Ma Se Ja Ma Se Ja

From the graph above, it seems that the seasonality component is


present. The detection of seasonality is done by autocorrelation
test which we would discuss later
Private and Confidential 7
Moving Averages
• As discussed earlier, moving averages are smoothing techniques
applied to a time series
• Simple Moving Average
– Average of 1st consecutive n elements where n = 2,3,4……N-k
– Followed by 2nd , 3rd , 4th ….etc consecutive n-1,n-2,n-3,n-4 elements
– Consider a vector vec -> c(1:10)
– SMA for n=3 of vec would be a new vector, say vec.sma which would be
• vec.sma -> c(2:9)
• Exponential Moving Average
– The popular method of exponential smoothing assigns geometrically
decreasing weights:
– such that

xn = α.xn-1+ α.(1-α).xn-2 + α.(1-α)2 xn-3 + α.(1-α)3 xn-4 + …


Ignoring the higher orders and replacing α by λ, we get
xn = α.xn-1+ α.(1-α).xn-2
– In its basic form exponential smoothing is applicable to time series
with no systematic trend and/or seasonal
Private and Confidential components 8
The ts() function

Private and Confidential 9


ts() function

• The function ts() is used to create time – series objects


• The common syntax is
– ts(data, start, end ,frequency, class = , names = )
• data -> a vector or matrix of the observed time-series values.
• start -> the time of the first observation. Either a single number or
a vector of two integers, which specify a natural time unit and a (1-
based) number of samples into the time unit.
• end -> the time of the last observation same as specified in start
• frequency -> the number of observations per unit of time.
• class -> class to be given to the result, or none if NULL or "none".
The default is "ts" for a single series, c("mts", "ts", "matrix") for
multiple series
• names -> a character vector of names for the series in a multiple
series: defaults to the colum names of data, or Series 1,Series 2, ....

Private and Confidential 10


ts() function
• An example
• Download a time series data as mentioned below
– Go to https://datamarket.com/data/list/?q=cat:g24 provider:tsdl 
– Download a .csv file / .xls file
– E.g. we have taken monthly advertising and sales data for 36 months
from ‘sales’ category
– Before we go and apply ts() function, we have to modify the date format
from DD-MM to DD-MM-YYYY.
– Now apply ts() with start date 1st Jan 2013 and End date as 1st Jan 2015;
since the data are monthly hence set frequency = 12
– When culling the data for time series, do not take ‘date’ into
consideration. Hence in ts() type
< - ts (data [,2:3],..other arguments)

Private and Confidential 11


Plotting time series

Private and Confidential 12


Plotting the time series

• We use plot.ts() function for plotting time series


• Alternatively we can use general plot () funtion
• Syntax of plot () function
– plot(time series, type, color, main, xlab, ylab.)
• type = what type of plot should be drawn. Possible types are
» "p" for points,
» "l" for lines,
» "h" for ‘histogram’ like (or ‘high-density’) vertical lines,
» "s" for stair steps,
• color = color of the plot
• main = heading / overall title of the chart
• xlab / ylab = subtitle of the X / Y Axis
– The output is given on next slide

Private and Confidential 13


Private and Confidential 14
Business Forecasting
using Moving Average

Private and Confidential 15


Moving Averages using R

• For simple moving averages we use SMA() function which is


available in TTR package
• Normal syntax
– sma (time series [,”Specific Column”], n = 2,3,4,……) n being the
number of periods to average over
• For exponential smoothing we use HoltWinters() function
• Normal syntax
– HoltWinters(time series, alpha, beta, gamma)
– Alpha = level; beta = trend; gamma = seasonal variation
– For most practical purpose, we ignore alpha, beta and gamma
– In that case R calculates those parameters by minimizing the mean
squared error
• Plotting the moving averages along with actual time series
– Plot.ts(time series)
– line (sma(time series,n=),col=“Blue”)

Private and Confidential 16


Moving Averages using R - Example

• Consider the earlier example


• Calculate the simple moving average using SMA and set n = 3
• So what is the simple moving average for the month of Dec 2014 in
sales?
– 13
• And that of advertising?
– 20.066
• What would be the output in HoltWinters() in case of
advertisements?
– Smoothing parameters:
• alpha: 0.7946802
• beta : 0
• gamma: 0

Private and Confidential 17


• Blue line is the
actual time
series and black
line is 3 months
simple moving
average

Private and Confidential 18


Forecasting using Exponential Moving Averages
• Exponential Smoothing can be used for forecasting.
• Recall HoltWinters() function
– The alpha component is the smoothing control parameter and it varies
from0 to 1
– Higher the value of alpha, better is the forecasted value.
– HoltWinters() actually predicts the value against each time period as
defined in ts() (i.e. it can’t provide the futuristic value for a period which is
outside the time period as defined in ts()
– These predicted values are stored in a different time series and can be
accessed by HoltWinters(time series)$fitted
• We use predict() function in order to predict the value for a specified period
• Syntax of predict()
– predict (HoltWinters(time series), n.ahead = )
– n.ahead being the time periods ahead of current one of which we want to
forecast.
– E.g. predict (HoltWinters(time series), n.ahead= 12) returns the predicted
value for the next 12 periods.
– Periods will be same as defined in declaring ts(…, frequency=)

Private and Confidential 19


Forecasting using Exponential Moving Averages

• Example
• Consider the sales and advertising example
• The predict() function for the next 5 periods would result
Feb Mar Apr May Jun
2015 14.054795 12.109590 10.164385 8.219180 6.273975

Private and Confidential 20


Decomposing a time series using R
• Recall that a time series is comprised of three parts i.e. trend, seasonality and a
random
• We use decompose() function which would store the three component's value
for each time period
• Syntax of decompose()
– decompose(time series)
– We can see values of each components
• decompose(time series)$trend or decompose(time series)$seasonal
• Deducting decompose(time series)$seasonal from ts(time series) would
generate another time series which would be seasonally adjusted
• We can also use stl() function for decomposing time series Xt by stl()
determining the trend Tt using “loess” regression and then computing the
seasonal component St (and the residuals et) from the differences Xt - Tt.
• Syntax of stl()
– stl(time series, s.window=“periodic”, t.window = numeric)
– s.window -> seasonal window. You can specify any ODD number (i.e. 15,
39etc.). Based on the number, the seasonality will be displayed
– t.window -> trend window. It is advisable to specify this by appropriate
ODD number

Private and Confidential 21


Decomposing a time series using R

• Plotting the
decomposed time
series
– Example of ad series

Private and Confidential 22


Exercise

• Open the excel worksheet as attached

Microsoft Excel
97-2003 Worksheet
• Plot the quarterly saving rate
• Calculate it’s mean and median
• Plot the 3 quarter’s moving average along with the actual time
series
• What is the value of alpha? Does the series has any seasonality?
• Predict the next 5 month’s savings rate.

• You can choose the capital expenditure series and work.

Private and Confidential 23


The ARIMA Model and its
use in Business

Private and Confidential 24


What is ARIMA Model

• ARIMA -> Auto Regressive Integrated Moving Average


• Exponential smoothing methods are useful for making forecasts,
and make no assumptions about the correlations between
successive values of the time series.
• However, if you want to make prediction intervals for forecasts
made using exponential smoothing methods, the prediction
intervals require that the forecast errors are uncorrelated and are
normally distributed with mean zero and constant variance.
• Autoregressive Integrated Moving Average (ARIMA) models include
an explicit statistical model for the irregular component of a time
series, that allows for non-zero autocorrelations in the irregular
component.
• ARIMA models are defined for stationary time series. Therefore, if
you start off with a non-stationary time series, you will first need to
‘difference’ the time series until you obtain a stationary time series.

Private and Confidential 25


Creating appropriate ARIMA Model

• In order to make a time series stationary, you have to create


another time series by an iterative process of taking differences at
various lags
– lag 1 -> value@ T – value@ T -1
– lag 2 -> Along with lag1, value @T-1 – value @ T-2 etc…
– So inclusion of each lag corresponds to the order, d
• Formally an ARIMA process is modeled as where
– p is the order of auto-regressionARIMA(p,d,q)
– q is the order of moving averages
– d is the order of lags (as described above)
• It is very important to provide the correct values of p, q and d in
order to get a time series that resembles to ARIMA process
– This would help in correct prediction or forecasting of time series.
• We use acf() and pacf() functions to determine ‘p’ and ‘q’.
• For ‘d’ we use diff() function

Private and Confidential 26


Creating appropriate ARIMA Model
• Normal syntax for diff
– diff(time series, differences = any integer number)
– E.g. diff(time series, difference = 4) -> lag order 4 (i.e. d = 4)
– We can plot() the ‘difference’ series to visualize and interpret whether the time
series becomes stationary
• Normal syntax for acf () / pacf()
– acf (diff(time series, difference= ..), lag.max = 20) -> will plot acf graph
– pacf (diff(time series, difference= ..), lag.max = 20) -> will plot pacf graph
• Finding ‘p’
– From the pacf graph, identify the lag at which pacf value is zero and are falling
from that lag within the boundary (i.e. 95% confidence interval)
– Ascertain that lag as value of ‘p’
• Finding ‘q’
– From the acf graph, identify the lag at which acf value is ‘zero’ and are falling
from that lag within the boundary (i.e. 95% confidence interval)
– Ascertain that lag as value of ‘q’
• You can use auto.arima() (from library(“forecast”)) to let R determine the
values of ‘p’,’q’ and ‘d’
Private and Confidential 27
Creating appropriate ARIMA Model

• Consider the ad and sales example


• We can use auto.arima() function.
• The result is shown below
– ARIMA(0,0,1) with non-zero mean
– Coefficients:
ma1 intercept
0.8180 22.5756
s.e. 0.1118 1.5874

sigma^2 estimated as 19.71: log likelihood=-73.29


AIC=152.58 AICc=153.72 BIC=156.24
• The ARIMA(0,0,1) means the series is not stationary.
• Follow the steps as mentioned in earlier slide to make the series stationary
by using diff() function
– Then use acf() / pacf() to identify

Private and Confidential 28


Step by Step guide to create ARIMA(p,d,q) Model

• Get the time series -> ts(read.csv(…..))


• Plot the EMA data points of that time series -> HoltWinters(time
series)$fitted
– This will be a new time series, say ts _ forecast
• Plot the residuals of ts_forecast
• From the graph, get an idea whether it is having a constant
variance.
– You can test the same by a box test
– Box.test (ts_forecast$residuals, lag = 20, type = “Ljung – Box”)
– If the p value is more than 0.05, then no auto-correlation exist in the
residuals and the time series is trending with seasonality (i.e. not
stationary)
– If p value is less than 0.05, then it is stationary (i.e. mean reverting)
• Use auto.arima(time series) to get the values of p,d and q.

Private and Confidential 29


Forecasting ARIMA series
• Normal syntax for arima()
– arima (time series, order = c(p,d,q), seasonal = list(order = c(p,d,q))
• This would show various results like AIC, coefficients, log likelihood,
sigma^2, BIC etc.
• Normal syntax for auto.arima
– auto.arima (time series)
• Forecasting an ARIMA series
– forecast.arima(time series, h=any number, level = c(99.5))
– h = prediction period, level = confidence interval
– You can use predict() function -> predict(arima(time series), order=c(p,d,q),
n.ahead = any umber)
• Considering the same ad and sales example
– Below would be the output for next 3 month’s ad using forecast.arima()
with 99.5% cf
Point Forecast Lo 99.5 Hi 99.5
Feb 2015 19.02734 6.565788 31.48889
Mar 2015 22.57563 6.476179 38.67509
Apr 2015 22.57563 6.476179 38.67509

Private and Confidential 30


Forecasting ARIMA series - Exercise

• Please refer the following example

Mic rosoft Ex cel


Comma Separated Values Fi

• Is the above series stationary?


• If no, then convert the series into a stationary using diff () function
• What would be the order of that series in ARIMA model?
• Plot the ARIMA series
• Predict values for next 2 quarters

Private and Confidential 31


QA
Thank You
For Your
Attention

&

You might also like