INDU 342: Logistics Network Models
Forecasting
Claudio Contardo
Mechanical, Industrial and Aerospace Engineering
Concordia University
Lecture 3
Histograms
Histograms can help at identifying (or at least to estimate) the
underlying probability distribution governing a random variable
First: Establish many equidistant intervals between MIN and MAX
Second: Compute the percentage of the total observations that fall
within each interval
Histograms
Mumbarai
In the Mumbarai problem, the logistics manager wants to verify visually
whether the parent distribution of the AGV travel times is normal. They
plot in the same diagram the density histogram of the available data and
the probability density function of the normal distribution with the same
mean and the standard deviation as the sample mean and the sample
standard deviation of the data. The visual inspection of the resulting
graphical representation shows that the intuition of the logistics manager
is correct
Histograms
Figure: Travel times histogram
Boxplots
Boxplot
A boxplot (or box-and-whisker plot) is a graphical display depicting
numerical data through their quartiles. In particular, the plot contains a
box whose extreme sides represent quartiles Q25 and Q75 . The box also
includes an internal line representing the median value and, possibly, a
small triangle associated with the mean of the data. In addition, the
representation includes two lines (called whiskers), extending from the
box to the minimum and to the maximum data values which are not
outliers, respectively
Boxplots
Ravaioli
Ravaioli is an Italian producer of fresh pasta, renown for its ravioli. The
sales of its spinach ravioli Nonna Pina are influenced heavily by the
company’s TV and social media advertising. The Table next reports the
latest 20 weekly expenditures (in ke) made by the company in TV and
social media advertising, together with the sales (always in ke) realized
in the same time periods. In order to devise sales forecast for inventory
planning, the logistics manager performed preliminarily an EDA. This
phase comprised the generation of the boxplots for the company’s TV
and social media advertising expenditures
Boxplots
Figure: Ravaioli’s expenditures in marketing
Boxplots
average
max
Q75
median
Q25
min
Time series plots
Time series plot
A plot of a time series yt , t = 1 . . . T is a Cartesian diagram (t, yt ) in
which the horizontal axis shows graduations t = 1 . . . T of time using an
appropriate scale (weeks, months, quarters, years), while the vertical axis
shows the corresponding numerical values yt for each t
Time series plots
Example: Ravaioli
The EDA performed by the logistics manager of Ravaioli includes the
generation of the plot of the weekly sales of spinach ravioli
Figure: Weekly sales of spinach ravioli (in ke) in the Ravaioli problem
Bivariate EDA
Sample covariance
Given two time series xt , yt , t = 1 . . . T , their sample covariance is
T
1X
vxy = (xt − x)(yt − y)
T t=1
Since x, y may be in different units, it is best to normalize it using the
sample standard deviations Sx , Sy
vx
rxy =
Sx Sy
The term rxy is also referred to as the Pearson correlation coefficient
Bivariate EDA
If rxy is substantially less than zero ⇒ x, y are negatively correlated
If rxy is substantially greater than zero ⇒ x, y are positively
correlated
If rxy is close to zero ⇒ x, y are uncorrelated
Scatterplot
Scatterplot
A scatterplot is diagram using Cartesian coordinates to display
corresponding values for two numerical variables xt , yt , t = 1 . . . T of a
dataset
Scatterplot
Example: Ravaioli
The findings of the Ravaioli’s logistics manager about the quality of x1
and x2 as predictors are confirmed by 2D scatterplots illustrated in the
two Figures next
(b) Social media advertising vs TV
(a) TV advertising vs sales advertising
Data preprocessing
Data preprocessing
It is the process of performing data cleaning, interpolation, aggregation
or transformation of data before it can be used to make forecasts
Data preprocessing
Insertion of missing data
Simplest case: replace missing value with average of previous and
subsequent observations
Figure: Number of cars sold per month
In this case we can make
x6 = (x5 + x7 )/2 = (38, 521 + 41, 345)/2 = 39, 333
Data preprocessing
Outliers detection
Data may contain errors due to devices’ failures, human errors, or even
natural deviations. They can often lead to misleading forecasts. Their
identification can be challenging since there is no unified rule to consider
or not an observation as an outlier. Very application-dependent
Rule of thumb for outliers’ detection
If the trend is constant and there are no cyclical components observed
Compute Q25 , Q75 for a dataset
Discard every data point xt such that xt < Q25 − 1.5(Q75 − Q25 ) or
xt > Q75 + 1.5(Q75 − Q25 )
Data preprocessing
Example: Elleshop
Elleshop distributes electrical appliances in Austria. The Table below
reports its sales of smart LED TV sets in the province of Klagenfurt
during the last 12 months. Since the trend is constant and there are no
cyclical components, the above mentioned rule of thumb is used
Figure: Number of Smart TVs sets delivered monthly
Data preprocessing
Q25 = 866.25, Q75 = 977.5
The interval
[Q25 − 1.5(Q75 − Q25 ), Q75 + 1.5(Q75 − Q25 )] = [699.38, 1144.38]
The sales amount reported in month 8 (200) is identified as an
outlier and therefore removed
We replace it with the average sales reported for months 7 and 9
Data preprocessing
Data aggregation
It consists in merging disaggregated data from multiple sources (e.g.
monthly sales from individual retailers in a given district) into a single
time series (e.g. overall sales in a given district). Aggregating stochastic
data leads to more accurate data
Variability of aggregated data
Let X1 . . . Xn iid, expected value µ, stdev σ, and let Y = X1 + · · · + Xn
µY = nµ
σY2 = nσ 2
Therefore σY /µY = √1 σ/µ
n
Data preprocessing
Data aggregation
For the same reason, it may sometimes be convenient to aggregate data
if the fine granularity of a time series leads to too much variability
Daily sales vs weekly sales vs quarterly sales
Sales of cars of a particular make/model/year vs Sales of SUV cars
of a given maker
Sales in multiple small districts vs aggregate sales in a larger area
Data preprocessing
Removing calendar variations
Time series representing a cumulative amount over a time period (e.g.
monthly sales of a product) may contain calendar effects due to the
variability of a month/week length. Potential solution
Define wt = n/nt (e.g. n = average number of business days in a
month; nt = number of business days in month t)
Replace yt by yt′ = wt yt
Data preprocessing
Deflating monetary time series
For a time series impacted by inflation (e.g. yearly sales of a product over
a 10-year period), it may be convenient to deflate the data (compare
apples with apples)
Example: Cavis
Cavis is a wine-making company that sells its products almost exclusively
in France. The annual sales (in Me) over the last 10 years are reported
in the left Table. The same table also shows the annual rate of inflation
recorded in the decade. The deflated data is reported in the right Table
Data preprocessing
Adjusting for population variations
When forecasting some economic variables such as sales in a certain
geographic area, demographic variations need to be taken into account.
Let at be the population of a given market in time period t and let
yt , t = 1 . . . T , be the time series. Then, forecasts are devised on
yt′ = aa1t yt
Data preprocessing
Example: Salus
Salus is a private company providing home care services for the elderly in
the Lombardy region, Italy. The annual number of customers over the
past decade is shown in the left Table below. Considering the annual
population of Lombardy (second and sixth columns of the right Table
below) over the same 10 years, the modified time series is obtained as
shown in the fourth and eighth columns.
Data preprocessing
Data normalization
Make data fit into an interval [m, M ], m < M . If y M IN , y M AX represent
min and max values for yt , t = 1 . . . T we let
yt − y M IN
yt′ = (M − m) + m
y AX − y M IN
M
The most usual form of normalization is the [0, 1]-normalization for
m = 0, M = 1
Classification of time series
Intermittent vs continuous
(a) Intermittent time series (b) Continuous time series
Classification of time series
Regular time series
A time series is said to be regular if it can be decomposed in
Trend. long term modification of a data pattern over time
Cycle. long term fluctuations due to the business cycle which
depends on macroeconomic issues. Four phases: prosperity (pr
boom), recession, depression, and recovery
Seasonality. Repeating occurrences in a cyclical manner of a pattern
caused by the periodicity of human activities: Christmas sales
season, Summer season for ice creams, gym registrations in January,
etc
Error. Also called residual component or noise, it is the irregular
component of the historical data
Regular time series
Figure: Example of a regular time series
Explanatory methods
Linear regression
Given a set of explanatory variables xti and an outcome yt , we want to
build a model that will approximate y = wT x + ϵ that will approximate y
as a linear function of the explanatory variables plus a random error
How to find w?
The vector w is chosen as the one that minimizes the sum of the square
errors, namely
XT
SSE = (yt − wT xt )2 . (1)
t=1
The first-order optimality conditions for a minimizer w∗ lead to the
following identity
∇w SSE = −2XT y + 2XT Xw∗ = 0
Explanatory methods
Linear regression
If XT X is non-singular, then
w∗ = (XT X)−1 Xy
Otherwise, it can be shown that two columns of X are linearly
dependent
If det(XT X) ∼ 0 the matrix is ill-conditioned ⇒ forecast very
sensitive to the input data