[go: up one dir, main page]

0% found this document useful (0 votes)
70 views23 pages

Week06 Regression-Based Forecasting

The document discusses regression-based forecasting techniques in time series analysis, including linear, exponential, and polynomial trends, as well as handling seasonality. It emphasizes the difference between descriptive modeling and forecasting, and provides examples using Amtrak ridership data. The summary highlights the use of regression models for various trends and the incorporation of seasonal effects through categorical variables.

Uploaded by

orhan sivrikaya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views23 pages

Week06 Regression-Based Forecasting

The document discusses regression-based forecasting techniques in time series analysis, including linear, exponential, and polynomial trends, as well as handling seasonality. It emphasizes the difference between descriptive modeling and forecasting, and provides examples using Amtrak ridership data. The summary highlights the use of regression models for various trends and the incorporation of seasonal effects through categorical variables.

Uploaded by

orhan sivrikaya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Ministry of Education

Humber College

BIA 5302-Machine Learning and Programming 2


Week 06: Regression-Based Forecasting

Dr. Raed Karim


Dr. Salam Ismaeel
Agenda

• Basic Ideas
• Regression-Based Forecasting
✓ Linear Trend
✓ Exponential Trend
✓ Polynomial Trend
✓ Handling Seasonality
• Summary
• Next Week's Midterm

1
Basic Idea

• Modeling time series data is done for either descriptive or predictive purposes.

• In descriptive modeling, or time series analysis, a time series is modeled to determine its components
in terms of seasonal patterns, trends, relation to external factors, etc.

✓ These can then be used for decision-making and policy formulation.

• In contrast, time series forecasting uses the information in a time series (and perhaps other
information) to forecast the future values of that series.

• The difference between the goals of time series analysis and time series forecasting leads to differences
in the type of methods used and in the modeling process itself.
Basic Idea (cont.)

• Time-Series forecasting uses the information in a time series to forecast future values of that series

• Time-series analysis, a time series modeled to determine its components in terms of seasonal patterns,
trends, relation to external factors, etc.

• Time-Series forecasting methods: Regression models vs smoothing (Data-driven) models.

• In both types of time series, and in general, can be consists of (discussed in) four components: level,
trend, seasonality, and noise.

• It is components Fit linear trend, time as a predictor

• Modify & use also for non-linear trends


✓ Exponential
✓ Polynomial
• Can also capture seasonality
Import Required Packages
import math
import numpy as np
import pandas as pd
import matplotlib.pylab as plt
import statsmodels.formula.api as sm
from statsmodels.tsa import tsatools, stattools
from statsmodels.tsa.arima_model import ARIMA
from statsmodels.graphics import tsaplots

4
A Model with Trend - Linear Trend
• To create a linear regression model that captures a time series with a global linear trend,

• The outcome variable (Y) is set as the time series values or some function of it, and the predictor (X) is set as a
time index.

Example1: fitting a linear trend to the Amtrak ridership data.

5
Example1: Ridership on Amtrak Trains Data
• Amtrak a US railway company
• Contain a series of monthly ridership between January 1991 and March 2004.
• Amtrak, a US railway company, routinely collects data on ridership. Here we focus on forecasting future
ridership using the series of monthly ridership.

# Load, convert Amtrak data for time series analysis


Amtrak_df = pd.read_csv('Amtrak.csv', squeeze=True) Amtrak_df.head(9)

print(Amtrak_df)

6
Example1: (cont.)

##Create column 'Date' that is a date data type


Amtrak_df['Date'] = pd.to_datetime(Amtrak_df.Month, format='%d/%m/%Y')
Amtrak_df.head(9)

# Pandas Version
ridership_ts.plot(ylim=[1300, 2300], legend=False)
plt.xlabel('Year'); plt.ylabel('Ridership (in 000s)')

7
Example1: Linear Trend
A linear fit to Amtrak ridership data
(Doesn’t fit too well – more later)

8
The Regression Model
Ridership Y is a function of time (t) and noise (error = e)

Yi = B0 + B1*t + e

Thus we model 3 of the 4 components:


✓ Level (B0)
✓ Trend* (B1)
✓ Noise (e)

*Our trend model is linear, which we can see from the graph is not a good fit (more later)

9
Example1: Regression Mode

# load data and convert to time series


Amtrak_df = pd.read_csv('Amtrak.csv')
Amtrak_df['Date'] = pd.to_datetime(Amtrak_df.Month, format='%d/%m/%Y')
ridership_ts = pd.Series(Amtrak_df.Ridership.values, index=Amtrak_df.Date)

# fit a linear trend model to the time series


ridership_df = tsatools.add_trend(ridership_ts, trend='ct')
ridership_lm = sm.ols(formula='Ridership ~ trend', data=ridership_df).fit()

# plot the time series


ax = ridership_ts.plot()
ax.set_xlabel('Time')
ax.set_ylabel('Ridership (in 000s)')
ax.set_ylim(1300, 2300)
ridership_lm.predict(ridership_df).plot(ax=ax)
plt.show()

https://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.OLS.html

10
Applying the model to partitioned data

# fit a linear model using the training set and predict on the validation set

ridership_lm = sm.ols(formula='Ridership ~ trend', data=train_df).fit()


predict_df = ridership_lm.predict(valid_df)

Ridership

Trend based on
training data
underestimates
validation period

Forecast
Errors

11
Implementation
def singleGraphLayout(ax, ylim, train_df, valid_df):
ax.set_xlim('1990', '2004-6')
ax.set_ylim(*ylim)
ax.set_xlabel('Time')
one_month = pd.Timedelta('31 days')
xtrain = (min(train_df.index), max(train_df.index) - one_month)
xvalid = (min(valid_df.index) + one_month, max(valid_df.index) - one_month)
xtv = xtrain[1] + 0.5 * (xvalid[0] - xtrain[1])
ypos = 0.9 * ylim[1] + 0.1 * ylim[0]
ax.add_line(plt.Line2D(xtrain, (ypos, ypos), color='black',linewidth=0.5))
ax.add_line(plt.Line2D(xvalid, (ypos, ypos), color='black',linewidth=0.5))
ax.axvline(x=xtv, ymin=0, ymax=1, color='black', linewidth=0.5)

ypos = 0.925 * ylim[1] + 0.075 * ylim[0]


ax.text('1995', ypos, 'Training')
ax.text('2002-3', ypos, 'Validation')

12
Implementation (cont.)
def graphLayout(axes, train_df, valid_df):
singleGraphLayout(axes[0], [1300, 2550], train_df, valid_df)
singleGraphLayout(axes[1], [-550, 550], train_df, valid_df)
train_df.plot(y='Ridership', ax=axes[0], color='C0', linewidth=0.75)
valid_df.plot(y='Ridership', ax=axes[0], color='C0', linestyle='dashed',
linewidth=0.75)
axes[1].axhline(y=0, xmin=0, xmax=1, color='black', linewidth=0.5)
axes[0].set_xlabel('')
axes[0].set_ylabel('Ridership (in 000s)')
axes[1].set_ylabel('Forecast Errors')
if axes[0].get_legend():
axes[0].get_legend().remove()

fig, axes = plt.subplots(nrows=2, ncols=1, figsize=(9, 7.5))


ridership_lm.predict(train_df).plot(ax=axes[0], color='C1')
ridership_lm.predict(valid_df).plot(ax=axes[0], color='C1', linestyle='dashed')

residual = train_df.Ridership - ridership_lm.predict(train_df)


residual.plot(ax=axes[1], color='C1')
residual = valid_df.Ridership - ridership_lm.predict(valid_df)
residual.plot(ax=axes[1], color='C1', linestyle='dashed')
graphLayout(axes, train_df, valid_df)
plt.tight_layout()
plt.show()

13
Example1: Summary
Summary: Linear model output
(training data)

ridership_lm.summary()

Partial output

coef std err t P>|t| [0.025 0.975]


------------------------------------------------------------------------------
Intercept 1750.3595 29.073 60.206 0.000 1692.802 1807.917
trend 0.3514 0.407 0.864 0.390 -0.454 1.157

14
Exponential Trend

• Appropriate model when increase/decrease in series over time is multiplicative

e.g., t1 is x% more than t0, t2 is x% more than t1…

• Replace Y with log(Y) then fit linear regression


log(Yi) = B0 + B1t + e

15
Example 1: Exponential Trend

Fitting the exponential trend model, making predictions

ridership_lm_linear = sm.ols(formula='Ridership ~ trend', data=train_df).fit()


predict_df_linear = ridership_lm_linear.predict(valid_df)
ridership_lm_expo = sm.ols(formula='np.log(Ridership) ~ trend', data=train_df).fit()
predict_df_expo = ridership_lm_expo.predict(valid_df)

Exponential trend
(green) is very similar to
linear trend (orange) –
neither copes well with
an initial period of
decline followed by a
growth period

16
Polynomial Trend

• Add additional predictors as appropriate


• For example, for quadratic relationships add a t2 predictor
• Fit linear regression using both t and t2
Example: Fitting a quadratic model
ridership_lm_poly = sm.ols(formula='Ridership ~ trend + np.square(trend)',
data=train_df).fit()

Better job capturing the


trend, though it over
forecasts in the
validation period.

17
Handling Seasonality
• Seasonality is any recurring cyclical pattern of
consistently higher or lower values (daily, weekly,
monthly, quarterly, etc.)

• Handle in regression by adding a categorical variable


for the season, e.g., 11 dummies for the month (using
all 12 would produce multicollinearity error)

Adding seasonality

ridership_df = tsatools.add_trend(ridership_ts, trend='c')


ridership_df['Month'] = ridership_df.index.month

# partition the data


train_df = ridership_df[:nTrain]
valid_df = ridership_df[nTrain:]
ridership_lm_season = sm.ols(formula='Ridership ~ C(Month)', data=train_df).fit()
ridership_lm_season.summary()

18
Example 1: Model with Seasonality

19
Summary

Regression-Based Forecasting:

• Can use linear regression for exponential models (use logs) and polynomials (exponentiation)

• For seasonality, use a categorical variable (make dummies)

20
Agenda

• Basic Ideas
• Regression-Based Forecasting
✓ Linear Trend
✓ Exponential Trend
✓ Polynomial Trend
✓ Handling Seasonality
• Summary
• Next Week’s Midterm

21
Ministry of Education
Humber College

BIA 5302-Machine Learning and Programming 2


Week 07: Midterm (In-Person Only)

Dr. Raed Karim


Dr. Salam Ismaeel

You might also like