A Quick Start of Time Series
Forecasting with a Practical
Example using FB Prophet
Yang Lyla
Follow
Jan 3 · 9 min read
Table of Contents
1. Introduction
Time Series Analysis
Why Facebook Prophet?
2. The Prophet Forecasting Model
Saturating growth
Trend Change points
Seasonality, Holiday Effects, And Regressors
3. Case study: forecasting advertising spend with Prophet
4. Closing Summary
1.1 Time Series Analysis
Time series analysis is an approach to analyze time series data to
extract meaningful characteristics of data and generate other
useful insights applied in business situation. Generally, time-series
data is a sequence of observations stored in time order. Time-
series data often stands out when tracking business metrics,
monitoring industrial processes and etc.
Time series analysis helps understand time based patterns of a set
of metric data points which is critical for any business. Techniques
of time series forecasting could answer business questions like
how much inventory to maintain, how much website traffic do you
expect in your e-store to how many product will be sold in the next
month — all of these are important time series problems to solve.
The basic objective of time series analysis usually is to determine a
model that describes the pattern of the time series and could be
used for forecasting.
Classical time series forecasting techniques build on stats models
which requires lots of effort to tune models and expect in data and
industry. The person has to tune the parameters of the method
with regards to the specific problem when a forecasting model
doesn’t perform as expected. Tuning these methods requires a
thorough understanding of how the underlying time series models
work. It’s difficult for some organizations to handling those
forecasting without data science teams. And it might seem doesn’t
profitable for an organization to have a bunch of expects on board
if there is no need a build a complex forecasting platform or other
services.
1.2 Why Facebook Prophet?
Facebook developed an open sourcing Prophet, a forecasting tool
available in both Python and R. It provides intuitive parameters
which are easy to tune. Even someone who lacks deep expertise in
time-series forecasting models can use this to generate meaningful
predictions for a variety of problems in business scenarios.
From Facebook Prophet website:
“ Producing high quality forecasts is not an easy problem for either
machines or for most analysts. We have observed two main
themes in the practice of creating a variety of business forecasts:
Completely automatic forecasting techniques can be
brittle and they are often too inflexible to incorporate
useful assumptions or heuristics.
Analysts who can product high quality forecasts are
quite rare because forecasting is a specialized data
science skill requiring substantial experience. ”
1.3 Highlights of Facebook Prophet
Very fast, since it’s built in Stan, a programming
language for statistical inference written in C++.
An additive regression model where non-linear trends
are fit with yearly, weekly, and daily seasonality, plus
holiday effects: 1. A piecewise linear or logistic growth
curve trend. Prophet automatically detects changes in
trends by selecting changepoints from the data 2. A
yearly seasonal component modeled using Fourier series
3. A weekly seasonal component using dummy variables
4. A user-provided list of important holidays.
Robust to missing data and shifts in the trend, and
typically handles outliers .
Easy procedure to tweak and adjust forecast while
adding domain knowledge or business insights.
2.1 The Prophet Forecasting Model
The Prophet uses a decomposable time series model with three
main model components: trend, seasonality, and holidays. They
are combined in the following equation:
y(t)= g(t) + s(t) + h(t) + εt
g(t): piecewise linear or logistic growth curve for
modeling non-periodic changes in time series
s(t): periodic changes (e.g. weekly/yearly seasonality)
h(t): effects of holidays (user provided) with irregular
schedules
εt: error term accounts for any unusual changes not
accommodated by the model
Using time as a regressor, Prophet is trying to fit several
linear and non linear functions of time as
components. Modeling seasonality as an additive
component is the same approach taken by exponential
smoothing in Holt-Winters technique . Prophet is
framing the forecasting problem as a curve-fitting
exercise rather than looking explicitly at the time based
dependence of each observation within a time series.
2.2 Saturating growth
Set a carrying capacity capto specify the maximum
achievable point due to the business scenarios or
constraints: market size, total population size,
maximum budget, etc.
A saturating minimum, which is specified with a
column floor in the same way as the cap column specifies
the maximum.
2.3 Trend Changepoints
The model could be overfitting or underfitting while working with
the trend component. The input of changepoints built in Prophet
allowed is increased the fit becomes more flexible.
Here, you can nicely apply your business insights: big jump of
sales during holidays, cost decreasing in future by purpose and etc.
A user can also manually feed the changepoints with those
business insights if it is required. In the below plot, the dotted
lines represent the changepoints for the given time series.
2.4 Seasonality, Holiday Effects, And Regressors
Seasonal effects s(t) are approximated by the following function:
Prophet has a built-in holiday feature which allows inputs of
customized recurring events.
Finally, action time!
3. Case study: forecasting advertising spend with
Prophet in Python
I took the sample data of advertising spend from a digital
marketing platform. I also did some changes on purpose to make
it a ‘fake’ data source in order to use in this case study.
Here, we try to use last 17 month data to predict the next 30 days
ad spend.
Step 1: Import libraries and data set:
[Code]:
import pandas as pd
pd.set_option(‘display.max_columns’, None)
import numpy as np
from fbprophet import Prophet
%matplotlib inline
import matplotlib.pyplot as plt
sample=pd.read_csv(‘/…/ad_spend.csv’)
Step 2: Check data info
[Code]:
From the above, the data set contains one and half year daily
advertising spend from 2017–06–01 to 2018–12–30. There are
577 rows and two columns( date and spend) in the data frame.
Let’s check the missing value:
there is no missing value (from the able below) which is great!👏
[Code]:
Step 3: Plot time-series data
Y-Axis: Ad Spend; X-Axis: Date
It can be seen from the plot that there is roughly constant level
(the mean of daily spend: 200K USD). The seasonal fluctuation
and random fluctuations roughly are constant in size over time.
This suggests that it’s probably appropriate to describe the data
using an additive model which is Prophet built on.
Step 4: Modeling
Split data set into training set and test set. The training set
contains daily ad spend from 2017–06–01 to 2018–11–30 while
the test set contains daily ad spend from 2018–12–01 to 2018–12–
30. Here we would like to use training data set to predict next 30
days ad spend.
Let’s try first model by itself without giving any parameters.
[Code]:
model1=Prophet(interval_width=0.95) # by default is 80%
‘interval_width=0.95’, this sets the uncertainty interval to
produce a confidence interval around the forecast.
Generate the forecasting plot below:
Y-Axis: Ad Spend; X-Axis: Date.
It’s always nice to check how does the model perform on historical
data. (Deep blue line is forecasting spend numbers, black dots are
actually spend numbers. The light blue shade is 95% confidence
interval around the forecast.) From the plot, thought the model
tries to fit all data point smoothly but it fails to catch the
seasonality. The first model is not doing a good job on fitting the
data just by applying Prophet itself.
For second model, let’s apply some business insights to tweak the
first model. Just asking some business questions such as
seasonality trends and holiday event affects, it’s easy to input
those information into Prophet.
We applied: yearly_seasonality,
weekly_seasonality ,holidays( holiday events manually created
here but you could also apply Country Holidays built-in by
Prophet) and changepoint_prior_scale to make the model more
flexible to fit the data points. Then, we added monthly seasonality.
[Code]:
model2=Prophet(interval_width=0.95,
yearly_seasonality=True, weekly_seasonality=True,
holidays=us_public_holidays, changepoint_prior_scale=2)
model2.add_seasonality(name=’monthly’, period=30.5,
fourier_order=5, prior_scale=0.02).
Generate the forecasting plot below:
Y-Axis: Ad Spend; X-Axis: Date.
From the plot, it seems that the second model is able to catch the
seasonality and fit historical data very well. (Deep blue line is
forecasting spend numbers, black dots are actually spend
numbers. The light blue shade is 95% confidence interval around
the forecast.)
Check the trends and seasonality components:
From the yearly trend, spend went up right at the beginning of the
year and deeply down during the Jun, Aug and Dec. The weekly
trend shows that weekdays played a big role here. And holiday
events has negative affect on ad spend which means it drove ad
spend decreasing and so on. You could probably check those
information with the business domain knowledge.
Step 5: Validation
First let’s check the fit by visualizing the forecasting line and
observed line:
Y-Axis: Ad Spend; X-Axis: Date.
From the plot, it seems that the model is able to fit the data points
very well thought it doesn’t catch the pattern at the end of Dec.
However, remember it just takes about roughly about 15mins to
input all business information to get such a fair result. It doesn’t
require an experience expertise in time-series modeling or
Machine-learning knowledge to build. Almost every analyst is able
to do it ( however, skill set of Python or R is a must -have. 🙃 )
Usually, some popular error terms such Root Mean Square Error
(RMSE) and Mean Absolute Error (MAE) are used during the
modeling evaluation. But I wouldn’t like to discuss those errors
terms here since there is only one model. ( I will discuss those
error terms in my next post when comparing Prophet and classic
time-series models)
Let look at the model performance by comparing the forecast
value and observed value:
[Code]:
Though the predicted value is about 13% higher than the actual
value, but interval between predicted value and lower bound is
able to catch the actual value. So far, the model is doing fairly good
and it takes about 15mins.
5. Closing Summary
There are many time-series analysis we can explore from now on,
such as anomaly detection, forecast time-series with external data
source. We have only just started.
From the practical example, it seems that Prophet provides
completely automated forecasts just as its official document states.
It’s fast and productive which would be very useful if your
organization doesn’t have a very solid data science team handing
predictive analytics. It saves your time to answer internal
stakeholder’s or client’s forecasting questions without spending
too much effort to build an amazing model based on classic time-
series modeling techniques.
Next post, I will compare Prophet and Classic time-series
forecasting techniques such as ARMIA model focusing on
efficiency and performance.
Reference and useful sources:
Facebook Prophet official document, must read if you would like
to play with Prophet.
An Intro to Facebook Prophet, it generally explain what is times-
series analysis and gives an overview of Facebook Prophet.
Generate Quick and Accurate Time Series Forecasts using
Facebook’s Prophet (with Python & R codes), it covers brief
introduction of Facebook Prophet in both R and Python. It might
be useful to you if you are a R user.
Introduction
Understanding time based patterns is critical for any business. Questions like how much
inventory to maintain, how much footfall do you expect in your store to how many people
will travel by an airline – all of these are important time series problems to solve.
This is why time series forecasting is one of the must-know techniques for any data
scientist. From predicting the weather to the sales of a product, it is integrated into the data
science ecosystem and that makes it a mandatory addition to a data scientist’s skillset.
If you are a beginner, time series also provides a good way to start working on real life
projects. You can relate to time series very easily and they help you enter the larger world
of machine learning.
Prophet is an open source library published by Facebook that is based on decomposable
(trend+seasonality+holidays) models. It provides us with the ability to make time series
predictions with good accuracy using simple intuitive parameters and has support for
including impact of custom seasonality and holidays!
In this article, we shall cover some background on how Prophet fills the existing gaps in
generating fast reliable forecasts followed by a demonstration using Python. The final
results will surprise you!
Table of Contents
1. What’s new in Prophet?
2. The Prophet Forecasting Model
o Trend
Saturating growth
Changepoints
o Seasonality
o Holidays and events
3. Prophet in action (using Python & R)
o Trend Parameters
o Seasonality and Holiday Parameters
o Predicting passsenger traffic using Prophet
What’s new in Prophet?
When a forecasting model doesn’t run as planned, we want to be able to tune the
parameters of the method with regards to the specific problem at hand. Tuning these
methods requires a thorough understanding of how the underlying time series models work.
The first input parameters to automated ARIMA, for instance, are the maximum orders of
the differencing, the auto-regressive components, and the moving average components. A
typical analyst will not know how to adjust these orders to avoid the behaviour and this is
the type of expertise that is hard to acquire and scale.
The Prophet package provides intuitive parameters which are easy to tune. Even someone
who lacks expertise in forecasting models can use this to make meaningful predictions for a
variety of problems in a business scenario.
The Prophet Forecasting Model
We use a decomposable time series model with three main model components: trend,
seasonality, and holidays. They are combined in the following equation:
g(t): piecewise linear or logistic growth curve for modelling non-periodic changes in
time series
s(t): periodic changes (e.g. weekly/yearly seasonality)
h(t): effects of holidays (user provided) with irregular schedules
εt: error term accounts for any unusual changes not accommodated by the model
Using time as a regressor, Prophet is trying to fit several linear and non linear functions of
time as components. Modeling seasonality as an additive component is the same approach
taken by exponential smoothing in Holt-Winters technique . We are, in effect, framing the
forecasting problem as a curve-fitting exercise rather than looking explicitly at the time
based dependence of each observation within a time series.
Trend
Trend is modelled by fitting a piece wise linear curve over the trend or the non-periodic part
of the time series. The linear fitting exercise ensures that it is least affected by
spikes/missing data.
Saturating growth
An important question to ask here is – Do we expect the target to keep growing/falling for
the entire forecast interval?
More often than not, there are cases with non-linear growth with a running maximum
capacity. I will illustrate this with an example below.
Let’s say we are trying to forecast number of downloads of an app in a region for the next
12 months. The maximum downloads is always capped by the total number of smartphone
users in the region. The number of smartphone users will also, however, increase with time.
With domain knowledge at his/her disposal, an analyst can then define a
varying capacity C(t) for the time series forecasts he/she is trying to make.
Changepoints
Another question to answer is whether my time series encounters any underlying changes
in the phenomena e.g. a new product launch, unforeseen calamity etc. At such points, the
growth rate is allowed to change. These changepoints are automatically selected. However,
a user can also feed the changepoints manually if it is required. In the below plot, the dotted
lines represent the changepoints for the given time series.
As the number of changepoints allowed is increased the fit becomes more flexible. There
are basically 2 problems an analyst might face while working with the trend component:
Overfitting
Underfitting
A parameter called changepoint_prior_scale could be used to adjust the trend flexibility and
tackle the above 2 problems. Higher value will fit a more flexible curve to the time series.
Seasonality
To fit and forecast the effects of seasonality, prophet relies on fourier series to provide a
flexible model. Seasonal effects s(t) are approximated by the following function:
P is the period (365.25 for yearly data and 7 for weekly data)
Parameters [a1, b1, ….., aN, bN] need to be estimated for a given N to model seasonality.
The fourier order N that defines whether high frequency changes are allowed to be
modelled is an important parameter to set here. For a time series, if the user believes the
high frequency components are just noise and should not be considered for modelling,
he/she could set the values of N from to a lower value. If not, N can be tuned to a higher
value and set using the forecast accuracy.
Holidays and events
Holidays and events incur predictable shocks to a time series. For instance, Diwali in India
occurs on a different day each year and a large portion of the population buy a lot of new
items during this period.
Prophet allows the analyst to provide a custom list of past and future events. A window
around such days are considered separately and additional parameters are fitted to model
the effect of holidays and events.
Prophet in action (using Python)
Currently implementations of Prophet are available in both Python and R. They have exactly
the same features.
Prophet() function is used do define a Prophet forecasting model in Python. Let us look at
the most important parameters:
3.1 Trend parameters
Parameter Description
growth linear’ or ‘logistic’ to specify a linear or logistic trend
List of dates at which to include potential changepoints (automatic if not
changepoints
specified)
If changepoints in not supplied, you may provide the number of changepoints
n_changepoints
be automatically included
changepoint_prior_scale Parameter for changing flexibility of automatic changepoint selection
3.2 Seasonality & Holiday Parameters
Parameter Description
yearly_seasonality Fit yearly seasonality
weekly_seasonality Fit weekly seasonality
daily_seasonality Fit daily seasonality
holidays Feed dataframe containing holiday name and date
seasonality_prior_scale Parameter for changing strength of seasonality model
holiday_prior_scale Parameter for changing strength of holiday model
yearly_seasonality, weekly_seasonality & daily_seasonality can take values as True, False
and no of fourier terms which was discussed in the last section. If the value is True, default
number of fourier terms (10) are taken. Prior scales are defined to tell the model how
strongly it needs to consider the seasonal/holiday components while fitting and forecasting.
Predicting passsenger traffic using Prophet
Now that we are well versed with nuts and bolts of this amazing tool. Lets dive into a real
dataset to see its potential. Here I have used Prophet in python for one of the practice
problems available on datahack platform at this link.
The dataset is a univariate time series that contains hourly passenger traffic for a new
public transport service. We are trying to forecast the traffic for next 7 months given
historical traffic data of last 25 months. Basic EDA for this can be accessed from
this course.
Import necessary packages and reading dataset
# Importing datasets
import pandas as pd
import numpy as np
from fbprophet import Prophet
# Read train and test
train = pd.read_csv('Train_SU63ISt.csv')
test = pd.read_csv('Test_0qrQsBZ.csv')
# Convert to datetime format
train['Datetime'] = pd.to_datetime(train.Datetime,format='%d-%m-%Y %H:%M')
test['Datetime'] = pd.to_datetime(test.Datetime,format='%d-%m-%Y %H:%M')
train['hour'] = train.Datetime.dt.hour
We see that this time series has a lot of noise. We could re-sample it day wise and sum to
get a new series with reduced and noise and thereby easier to model.
# Calculate average hourly fraction
hourly_frac = train.groupby(['hour']).mean()/np.sum(train.groupby(['hour']).mean())
hourly_frac.drop(['ID'], axis = 1, inplace = True)
hourly_frac.columns = ['fraction']
# convert to time series from dataframe
train.index = train.Datetime
train.drop(['ID','hour','Datetime'], axis = 1, inplace = True)
daily_train = train.resample('D').sum()
Prophet requires the variable names in the time series to be:
y – Target
ds – Datetime
So, the next step is to convert the dataframe according to the above specifications
daily_train['ds'] = daily_train.index
daily_train['y'] = daily_train.Count
daily_train.drop(['Count'],axis = 1, inplace = True)
Fitting the prophet model:
m = Prophet(yearly_seasonality = True, seasonality_prior_scale=0.1)
m.fit(daily_train)
future = m.make_future_dataframe(periods=213)
forecast = m.predict(future)
We can look at the various components using the following command:
m.plot_components(forecast)
Using the mean hourly fraction for each hour from 0 to 23, we could then convert the daily
forecasts into hourly forecasts make submission. This is how our forecasts over the daily
data looks like.
# Extract hour, day, month and year from both dataframes to merge
for df in [test, forecast]:
df['hour'] = df.Datetime.dt.hour
df['day'] = df.Datetime.dt.day
df['month'] = df.Datetime.dt.month
df['year'] = df.Datetime.dt.year
# Merge forecasts with given IDs
test = pd.merge(test,forecast, on=['day','month','year'], how='left')
cols = ['ID','hour','yhat']
test_new = test[cols]
# Merging hourly average fraction to the test data
test_new = pd.merge(test_new, hourly_frac, left_on = ['hour'], right_index=True, how
= 'left')
# Convert daily aggregate to hourly traffic
test_new['Count'] = test_new['yhat'] * test_new['fraction']
test_new.drop(['yhat','fraction','hour'],axis = 1, inplace = True)
test_new.to_csv('prophet_sub.csv',index = False)
This gets a score of 206 on the public leaderboard and does produce a stable model.
Readers can go ahead and tweak the hyperparameters (fourier order for
seasonality/changeover) to get a better score. Reader could also try and use a different
technique to convert the daily predictions to hourly data for submission and may get a better
score.