[go: up one dir, main page]

0% found this document useful (0 votes)
14 views14 pages

Anomaly Detection

This document presents a methodology for robustly monitoring time series data to detect outliers and level shifts. The proposed approach combines ideas from robust regression with alternating least squares to fit a model to time series data that may include trends, seasonal patterns, outliers, and level shifts. A "double wedge plot" is introduced to graphically display outliers and potential level shifts detected in the data. The methodology is applied to real trade volume time series containing a downward level shift. Results are also compared to other existing time series analysis methods. The approach aims to automatically analyze thousands of short time series to monitor for fraud in international trade data.

Uploaded by

Sakshi Chauhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views14 pages

Anomaly Detection

This document presents a methodology for robustly monitoring time series data to detect outliers and level shifts. The proposed approach combines ideas from robust regression with alternating least squares to fit a model to time series data that may include trends, seasonal patterns, outliers, and level shifts. A "double wedge plot" is introduced to graphically display outliers and potential level shifts detected in the data. The methodology is applied to real trade volume time series containing a downward level shift. Results are also compared to other existing time series analysis methods. The approach aims to automatically analyze thousands of short time series to monitor for fraud in international trade data.

Uploaded by

Sakshi Chauhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Econometrics and Statistics 9 (2019) 108–121

Contents lists available at ScienceDirect

Econometrics and Statistics


journal homepage: www.elsevier.com/locate/ecosta

Robust Monitoring of Time Series with Application to Fraud


Detection ✩
Peter Rousseeuw a, Domenico Perrotta b, Marco Riani c, Mia Hubert a,∗
a
Department of Mathematics, KU Leuven, Celestijnenlaan 200B, BE-3001 Leuven, Belgium
b
Joint Research Centre of the European Commission, Via E. Fermi 2479, TP 440I – 21027 Ispra, Italy
c
Department of Economics and Management, Via Kennedy 6, I – 43125 Parma, Italy

a r t i c l e i n f o a b s t r a c t

Article history: Time series often contain outliers and level shifts or structural changes. These unexpected
Received 16 November 2017 events are of the utmost importance in fraud detection, as they may pinpoint suspicious
Revised 20 May 2018
transactions. The presence of such unusual events can easily mislead conventional time
Accepted 24 May 2018
series analysis and yield erroneous conclusions. A unified framework is provided for de-
Available online 6 July 2018
tecting outliers and level shifts in short time series that may have a seasonal pattern. The
Keywords: approach combines ideas from the FastLTS algorithm for robust regression with alternating
Alternating least squares least squares. The double wedge plot is proposed, a graphical display which indicates out-
Double wedge plot liers and potential level shifts. The methodology was developed to detect potential fraud
Level shift cases in time series of imports into the European Union, and is illustrated on two such
Outliers series.
© 2018 The Author(s). Published by Elsevier B.V. on behalf of EcoSta Econometrics and
Statistics.
This is an open access article under the CC BY-NC-ND license.
(http://creativecommons.org/licenses/by-nc-nd/4.0/)

1. Introduction

When analyzing time series one often encounters unusual events such as outliers and structural changes, like those in
Fig. 1. Both series track trade volumes, and were extracted from the official trade statistics in the COMEXT database of
Eurostat. This database contains monthly trade volumes (aggregated over several transactions, possibly involving different
traders) of products imported in the European Union (EU) in a four-year period. The plot titles in Fig. 1 specify the code
of the traded product in the EU Combined Nomenclature classification (CN code), the country of origin, and the destination
(a member state of the EU). The CN code determines whether the volumes are expressed in tons of net mass and/or other
units (liters, number of items, etc.), the rate of customs duty applied, and how the goods are treated for statistical purposes.


The work of Peter Rousseeuw and Mia Hubert was supported by project C16/15/068 of Internal Funds KU Leuven. Peter Rousseeuw, Marco Riani and
Mia Hubert gratefully acknowledge support from the CRoNoS project, with reference CRoNoS COST Action IC1408. This research benefits from the High
Performance Computing facility of the University of Parma. The work of Domenico Perrotta was supported by the Project “Automated Monitoring Tool on
External Trade Step 5” of the Joint Research Centre and the European Anti-Fraud Office of the European Commission, under the Hercule-III EU programme.
In this paper, references to specific countries and products are made only for purposes of illustration and do not necessarily refer to cases investigated or
under investigation by anti-fraud authorities.

Corresponding author.
E-mail address: mia.hubert@kuleuven.be (M. Hubert).

https://doi.org/10.1016/j.ecosta.2018.05.001
2452-3062/© 2018 The Author(s). Published by Elsevier B.V. on behalf of EcoSta Econometrics and Statistics. This is an open access article under the CC
BY-NC-ND license. (http://creativecommons.org/licenses/by-nc-nd/4.0/)
P. Rousseeuw et al. / Econometrics and Statistics 9 (2019) 108–121 109

Fig. 1. Monthly trade volumes of two products imported in the European Union in a four-year period: (a) imports of plants used primarily in perfumery,
pharmacy or for insecticidal, fungicidal or similar purposes, from Kenya into the UK (P12119085-KE-GB); (b) import of sugars including chemically pure
lactose, maltose, glucose and fructose, sugar syrups, artificial honey and caramel, from the Ukraine into Lithuania (P17049075-UA-LT).

The data quality is quite heterogeneous across countries and products, but some macroscopic outliers (manifest errors) have
already been removed or corrected by statistical authorities and customs services.
Both of these time series exhibit a downward level shift. Knowing when such structural breaks occur is important for
fraud detection. For instance, a sudden reduction in trade volume may coincide with an increase for a related product or
another country of origin, which could indicate a misdeclaration with the intent of deflecting customs duties.
There are many products and countries of origin in the CN classification, but not all of these combinations occur and the
number of products at risk of fraud is relatively small. Still, the number of relevant combinations of a product at fraud risk,
a country of origin and a country of destination is around 16,0 0 0. As a result, every month around 16,0 0 0 time series need
to be analyzed for anti-fraud purposes. This requires an automatic approach that is able to report accurate information on
outliers and the positions and amplitudes of level shifts, and that runs fast enough for that time frame. The method pro-
posed in this paper meets those objectives, and provides a graphical display that can be looked at whenever the automatic
monitoring system detects a significant level shift. Our method follows the approach which first computes a robust fit to
the majority of the data and then detects outliers by their large residuals, as described in the review paper (Rousseeuw and
Hubert, 2018).
A different statistical approach to monitor international trade data for fraud was proposed by Barabesi et al. (2016) who
tested whether the distribution of trade volumes follows the Newcomb–Benford law. In the current paper we also take the
time sequence of the trades into account. We will focus on a parametric approach to estimate level shifts, which differs
from the nonparametric smoothing methods in Fried and Gather (2007) or robust methods for REGARIMA models (Bianco
et al., 2001). A popular technique is the X-13 ARIMA-SEATS Seasonal Adjustment methodology (Findley et al., 1998; U.S.
Census Bureau, 2017). X-13 is based on automatic fitting of ARIMA models and includes detection of additive outliers and
level shifts. We will compare our results with those of X-13 in Section 5. See also Galeano and Peña (2013) for a review of
robust modeling of linear and nonlinear time series.
Although this paper was motivated by the need to analyze many short time series of trade data, we will describe the
methodology more generally so it can be applied to other types of time series that may be longer and can be modeled with
more parameters.
The structure of the paper is as follows. In Section 2 we introduce our model and methodology for robustly analyzing
a time series which contains a trend, a seasonal component and possibly a level shift in an unknown position, as well as
isolated or consecutive outliers. In Section 3 we illustrate the proposed approach using the well-known airline data (Box and
Jenkins, 1976), as well as contaminated versions of it in order to test the ability of the method to detect anomalies. In this
section we also introduce the double wedge plot, which visualizes the presence of a level shift and outliers. In Section 4 we
apply our methodology to the time series in Fig. 1. Section 5 compares our results to those obtained by a nonparametric
method and to X-13. The case where more than one level shift occurs is discussed in Section 6. Section 7 concludes, and
the Appendix proves a result about our algorithm.

2. Methodology

2.1. The model

The time series y(t ) = yt (for t = 1, . . . , T ) we will consider may contain the following terms:

1. a polynomial trend, i.e. Aa=0 αa t a ;
110 P. Rousseeuw et al. / Econometrics and Statistics 9 (2019) 108–121

2. a seasonal component, i.e.


    

B
2π b 2π b
St = βb,1 cos t + βb,2 sin t . (1)
12 12
b=1

When B = 1 this is periodic with a one-year period, B = 2 corresponds with a six-month  period etc. We assume the

amplitude of the seasonal component varies over time in a polynomial way, i.e. yt ∼ 1 + Gg=1 γgt g St ;
3. a level shift in an unknown time point 2 ≤ δ 2 ≤ T, i.e. δ 1 I(t ≥ δ 2 ) with I(.) the indicator function.

The general model is thus of the form


     

A 
B
2π b 2π b 
G
yt = αat + a
βb,1 cos t + βb,2 sin t 1+ γgt g + δ1 I (t  δ2 ) + εt . (2)
12 12
a=0 b=1 g=1

One may assume that the irregular component ε t of the non-outliers is a stationary random process with E[εt ] = 0 and σ 2 =
V ar[εt ] < ∞. Let us collect all unknown parameters in a vector θ = (α0 , α1 , . . . , β1,1 , β1,2 , . . . , γ1 , γ2 , . . . , δ1 , δ2 ) of length p.
Then model (2) can be written as
yt = f (θ , t ) + εt
 
with f (θ , t ) = Aa=0 αa t a + St (1 + Gg=1 γgt g ) + δ1 I (t  δ2 ) . The model does not need to contain all of these components, as
some coefficients can be zero.

2.2. The nonlinear LTS estimator

Model (2) is nonlinear in the parameters β b, 1 , β b, 2 , γ g and δ 2 . As there may be outliers in the time series, we propose
to estimate θ by means of the nonlinear least trimmed squares (NLTS) estimator (Rousseeuw, 1984; Stromberg and Ruppert,
1992; Stromberg, 1993):


h
θˆNLTS = argmin r(2j ) (θ ) (3)
θ j=1

where T/2 ≤ h < T and r(2j ) (θ ) is the j-th smallest squared residual (yt − f (θ , t ))2 . Our default choice for h is [0.75 T].

The n-consistency and asymptotic normality of NLTS were studied by Čížek (20 05, 20 08). To compute the estimator,
we propose to combine ideas from the FastLTS algorithm for robust linear regression (Rousseeuw and Van Driessen, 2006)
with the alternating least squares (ALS) method.
We first describe how we use the alternating least squares procedure. We temporarily assume that the estimated shift
time δˆ2 is fixed, and that we want to solve (3) for a subset of the yt with at least p − 1 observations, at least one of
which is to the left of δˆ2 and at least one of which is equal or to the right of δˆ2 . We denote the indices of the subset
as E ⊂ {1, 2, . . . , T } with #E  p − 1 , where E must overlap with {1, . . . , δˆ2 − 1} as well as {δˆ2 , . . . , T }. These conditions are
required to make the parameters in (2) identifiable from the subset yE = {yt ; t ∈ E }. We then go through the following steps:

1. [Initialization] Set γg = 0 for g = 1, . . . , G. Then a part of (2) drops out, leaving


    

A 
B
2π b 2π b
yt = αat a + βb,1 cos t + βb,2 sin t + δ1 I (t  δˆ2 ) + εt (4)
12 12
a=0 b=1

which is linear in the parameters α a , β b, 1 , β b, 2 and δ 1 . By applying linear LS to the subset yE , we obtain the initial
estimates αˆ (0 ) , βˆ (0 ) , βˆ (0 ) and δˆ (0 ) .
a b,1 b,2 1
2. [Iteration] For k = 1, 2, . . . repeat the following steps:
  πb   πb 
• [ALS step A] Let St(k−1 ) = Bb=1 βˆb, (k−1 )
1
cos 212 (k−1 )
t + βˆb, 2
sin 212 t (k−1 )
in which the coefficients βˆb, 1
(k−1 )
and βˆb, 2
come

from the previous step. Keeping St(k−1 ) fixed yields the model


A 
G
yt − St(k−1) = αat a + St(k−1) γgt g + δ1 I (t  δˆ2 ) + εt (5)
a=0 g=1

which is linear in the parameters α a , γ g , and δ 1 . We then apply LS using only the observations in the subset yE ,
yielding the estimates αˆ (k ) , γˆ (k ) and δˆ (k ) .
a g 1
• [ALS step B] Keeping the estimated coefficients αˆ a(k ) , γˆg(k ) and δˆ1(k ) from the previous step fixed yields the model
     

A 
B
2π b 2π b 
G
yt − αˆ a(k)t a − δˆ1(k) I (t  δˆ2 ) = βb,1 cos t + βb,2 sin t 1+ γˆg(k)t g + εt (6)
12 12
a=0 b=1 g=1
P. Rousseeuw et al. / Econometrics and Statistics 9 (2019) 108–121 111

which is linear in the parameters β b, 1 and β b, 2 . We then apply LS using only the observations in the subset yE ,
(k ) (k )
yielding the estimates βˆb, 1
and βˆb, 2
. Then we go back to ALS step A.

Let θˆk be the vector of coefficients after iteration step k. We repeat the above steps until ||θˆk − θˆk−1 ||/||θˆk−1 || is below a
threshold, or a maximal number of iterations (say 50) is attained. Here || · || is the Euclidean norm.
In words, ALS solves the nonlinear LS problem of fitting (2) to the data set yE by alternating between the solution of two
linear LS fits, (5) and (6).
Our goal is to solve the nonlinear LTS problem (3). A basic tool for linear LTS is the C-step (Rousseeuw and Van Driessen,
2006) which we now generalize to the nonlinear setting.
[C-step] Start from a subset H (k ) ⊂ {1, 2, . . . , T } to which we fit θˆ (k ) obtained by applying ALS. Then compute the residuals
rt = yt − f (θˆ (k ) , t ) for the whole time series, that is, for t = 1, . . . , T and not just for t ∈ H(k) . Next retain the h observations
with smallest squared residuals, yielding the new subset H (k+1 ) . Then apply ALS to H (k+1 ) , yielding a new fit θˆ (k+1 ) . It is
shown in the Appendix that the new fit θˆ (k+1 ) is guaranteed to have a lower objective function than the old fit θˆ (k ) . It is
possible to iterate the C-step until convergence, which will occur in a finite number of steps.
Using these building blocks, we now describe the entire algorithm to compute the NLTS fit to the model (2). Let
t(1 ) , . . . , t(S ) be the ordered indices of the possible positions δ 2 of the level shift, for example the set {u + 1, . . . , T − u}
for some u > 0. The algorithm then consists of the following steps:

1. Loop over all t(s) where s = 1, . . . , S and do:


(a) Temporarily set δˆ2 = t . (s )
(b) Now loop over m ranging from 1 to the number of trial subsets M, and do:
(1) Construct an elemental subset E containing p − 1 different observations. This subset should contain the index t(s) ,
one observation yt with t < t(s) and p − 3 observations drawn at random from the whole time series. Note that we
impose that t belongs to E because the purpose of step 1 is to select the most suitable δˆ2 = t .
(s) (s )
(2) Run the initialization and ALS steps described above on E, keeping δˆ2 = t(s ) fixed. Then take two C-steps. [Two C-
steps is enough at this stage, in line with the results of Rousseeuw and Van Driessen (2006).] If a singular solution
is obtained during the computations, restart without increasing m.
The choice of M is a compromise since the expected number of outlier-free subsets E is proportional to M but the
computation speed is inversely proportional to M. In our experiments we found that M = 250 was sufficient to obtain
stable results.
(c) Consider only the nbest elemental subsets (among the M that were tried) that yielded the lowest objective function
so far. Apply C-steps to them until convergence and store these nbest solutions. In our examples we found that setting
nbest to 10 worked well.
(d) If s > 1 also start from the nbest elemental sets found when investigating t(s−1 ) , but this time setting δˆ2 = t(s ) . Apply
C-steps to them until convergence.
(e) Take the fit with the lowest objective among these 2 × nbest candidates, and denote it by θˆ (s ) .
(f) Store the corresponding scaled residuals

rt (θˆ (s ) )
r˜t (θˆ (s ) ) =  for t = 1, . . . , T . (7)
h 2
t=1 r(t ) (θˆ (s) )/h
h
2. Retain overall best solution. Among the fits θˆ (s ) for s = 1, . . . , S take the one with lowest objective function t=1 r(2t ) (θˆ (s ) )
and denote it by θˆ opt .
h
For estimating the scale of the error term we can use t=1 r(2t ) (θˆ opt ) . But since this sum of squares only uses the h most
central residuals, the estimate needs to be rescaled. The variance σ 2 (h) of a truncated normal distribution containing the
central h/T portion of the standard normal is
    
2T T +h T +h
σ 2
(h ) = 1 − −1
φ −1
h 2T 2T

by equation (6.5) in (Croux and Rousseeuw, 1992). Therefore we compute


h
σ˜ 2 = r(2t ) (θˆ opt )/(hσ 2 (h )) . (8)
t=1

Note that this makes σ˜ 2 consistent, but not yet unbiased for small samples. Therefore we include the finite-sample
correction factor from Pison et al. (2002) in our final scale estimate σˆ .
3. Locally improving the shift position estimate. The previous steps have yielded an estimate δˆ2 of the position of the level
shift, but it may be imprecise. For instance, it may happen that the h-subset underlying θˆ opt does not itself contain the
time points δˆ2 or δˆ2 + 1 . In order to improve the estimate we check in its vicinity as follows:
112 P. Rousseeuw et al. / Econometrics and Statistics 9 (2019) 108–121

Fig. 2. Airline data: observed and fitted values based on model (2) with a quadratic trend, a quarterly seasonal component, and a quadratically varying
amplitude.

• Take a window W around δˆ2 . For each t∗ in W, we replace δˆ2 by t∗ while keeping the other coefficients from θˆ opt

and the scale estimate σˆ . Compute the residuals rt from these coefficients and let f (t ∗ ) = t∈W ρ (rt /σˆ ) with ρ the
Huber function

x2 /2 if |x|  b
ρ (x ) =
b|x| − b2 /2 if |x| > b
In our simulations and the analysis of international trade time series (of the kind given in Fig. 1) the best results
were obtained with b equal to 1.5 or 2. In our implementation the defaults are b = 2 and a window W of width 15.
• Our final δˆ2 is the t∗ in W with lowest f(t∗ ). If it is different from the estimate we had before, we recompute the
scaled residuals.
4. Outlier detection. We apply the univariate outlier detection procedure described in Gervini and Yohai (2002) and
Agostinelli et al. (2015) to the T scaled residuals (yt − f (θˆ opt , t ))/σˆ . By default we use the 99% confidence level. Al-
ternatively, one could use the thresholds obtained in Salini et al. (2015).
5. Final fit. We apply nonlinear LS to all the points that have not been flagged as outliers in the previous step, starting from
the initial estimate θˆ opt and keeping δˆ2 fixed. For this we can iterate ALS steps until convergence. The standard errors
obtained in the last two ALS steps can be used for inference.

Note that h must be at least the number of parameters p in the model for identifiability. When h is as low as T/2 this means
T/p > 2. However, for stability it is often recommended that T/p > 5, see e.g. Rousseeuw and Leroy (1987). The Matlab code
of the algorithm can be downloaded from /www.riani.it/rprh/ or /rosa.unipr.it/fsda.html as part of the FSDA toolbox. In the
following sections we will apply it to several data sets.

3. Airline data and the double wedge plot

The airline passenger data, given as Series G in Box and Jenkins (1976), has often been used in the time series analysis
literature as an example of a nonstationary seasonal time series. It consists of T = 144 monthly total numbers of airline
passengers from January 1949 to December 1960. Box and Jenkins developed a two-coefficient time series model of fac-
tored form that is now known as the airline model. In this section we will analyze these data using our method, and then
contaminate the data in various ways to see how the method reacts.
Uncontaminated data. We fit the data by model (2) with A = 2, B = 4 and G = 2. This means that we assume a quadratic
trend, a quarterly seasonal component, and a quadratically varying amplitude. The resulting NLTS fit (3) closely follows the
data, as can be seen in Fig. 2. In this example no data point has been flagged as outlying. From the standard errors (not
shown) we conclude that all coefficients are significant except for the height of the level shift.
Contamination 1. We now contaminate the series by adding three groups of outliers, yielding the blue curve in the bottom
panel of Fig. 3. More precisely, the value 300 is subtracted from all responses in the interval [50,55] while 300 is added on
[122,127] and 400 is subtracted on [130, 134] . The fitted values (dotted curve) from NLTS closely follow the observed values
for the regular observations. The flagged outliers are indicated by red crosses, whose size is proportional to the absolute
P. Rousseeuw et al. / Econometrics and Statistics 9 (2019) 108–121 113

Fig. 3. Airline data with contamination 1: double wedge plot (top) and observed and fitted values (bottom).

magnitude of their residual. We see that all the outliers we added are clearly recognized as such, and they were not used to
estimate the coefficients in the weighted step. Only a few regular observations received an absolute residual slightly above
the cutoff value.
The top panel of Fig. 3 is a byproduct of the algorithm, and is useful for visualizing the presence of (groups of) outliers
and a level shift. The first step of the algorithm ranges over all potential positions t(1 ) , . . . , t(S ) of a level shift. These tentative
positions t are on the vertical axis. For any t we plot the absolute scaled residuals |r˜t (θˆ (s ) )| given in (7), in all of the
(s) (s)
times t = 1, 2, . . . , T on the horizontal axis. The color in the plot depends on the size of that absolute residual and ranges
from black (large residuals) over red and yellow to white (small residuals). The color scale is at the right of the plot. Scaled
residuals larger than 50 are shown as if they were 50, so that even a very far outlier cannot affect the color coding. In the
same spirit, uninformative scaled residuals smaller than 2.5 are shown as if they were 0, so in white. Of course the user can
easily modify these default choices.
Outliers have a large absolute scaled residual from the robust fit, so in this plot isolated outliers will appear as dark
vertical lines, and groups of consecutive outliers as dark vertical bands. In this example we clearly see the contamination.
The regular observations with scaled residual slightly above 2.5 do not stand out as they are in light yellow.
Contamination 2. In the second contamination setting we introduce a persistent level shift and three isolated outliers,
two of which lie in the proximity of the level shift which makes the problem harder. For this we added the value 1300 to
all responses from t = 68 onward, at t = 45 the response is lowered by 800, at t = 67 by 600, while at t = 68 and t = 69
we added an additional 800.
The bottom panel of Fig. 4 shows the observed and fitted values. Again all inserted outliers are clearly detected, and a
few regular observations have small crosses indicating that their scaled absolute residual was slightly above 2.5.
The plot of the absolute scaled residuals |r˜t (θˆ (s ) )| in the top panel of Fig. 4 now looks more eventful with two dark
triangles. Together these ‘wedges’ signal a level shift. To understand this effect, let us assume that the true level shift is at
position t∗ and the algorithm is in the process of checking the candidate t(s ) = t ∗ − r. Then the algorithm will treat the yt at
t ∗ − r + 1, . . . , t ∗ − 1 as outliers and the resulting robust fit (still for that t(s) ) will show r − 1 consecutive outliers. Similarly,
when the algorithm tries t(s ) = t ∗ + r to the right of t∗ , the best solutions will show r outliers. As a result, when approaching
the true level shift position t∗ from the left the scaled residuals we are monitoring will form a dark upward-pointing wedge,
and to the right of the true t∗ we obtain an analogous wedge pointing downward. In the top panel of Fig. 4. we observe two
opposite wedges tapering off in the proximity of the true level shift position, around t = 68. In this region we observe a
small rectangle (centered at position 68) bridging the two wedges. The rectangle is due to the two outliers in the proximity
of the level shift. The isolated outlier at position 45 yields a single dark vertical line like those in Fig. 3.
h
Panel (a) of Fig. 5 shows the boxplots of the objective function t=1 r(2t ) (θˆ j(s ) ) attained by the 2 × nbest = 20 best so-
lutions θˆ j in step 1(e) of the algorithm. It is thus also a free byproduct of the estimation. If a level shift is present in the
114 P. Rousseeuw et al. / Econometrics and Statistics 9 (2019) 108–121

Fig. 4. Airline data with contamination 2.

Fig. 5. Airline data with contamination 2: (a) boxplots of the 20 lowest objective function values attained at each t(s) ; (b) local improvement of the shift
position estimate.

central part of the time series, this plot will typically have a U shape. In this example the lowest values of the trimmed sum
of squared residuals occur in the time range 60–80. The continuous curve which connects the lowest objective value for
each s reaches its global minimum at t(s ) = 73. However, the curve is quite bumpy in that region, with several local minima
and a near-constant stretch on 67–70, so the position of the minimum is not precise. This kind of situation motivated the

local improvement in step 3 of the algorithm. Panel (b) of Fig. 5 shows f (t(s ) ) = t∈W ρ (rt /σˆ ) as a function of the tenta-
tive position t(s) (with ρ the Huber function with b = 2) on the interval 53–82. This curve has a much better determined
minimum, in fact at t = 68, confirming the benefit of the local improvement step.
Contamination 3. In the final contaminated dataset we inserted a level shift and a group of consecutive outliers following
it. To complicate things even more, we also put in a stretch of contamination to the left of the level shift, as well as an
isolated outlier. The bottom panel of Fig. 6 shows the robust fit, which succeeded in recovering the structure and flagging
the outliers. In the top panel of Fig. 6 we see the typical double wedge pattern indicating a level shift. The two reddish bands
flag the groups of consecutive outliers, whereas the single line corresponds to the isolated outlier at t = 90. In this example
the thick end of the upper wedge is yellow, so the absolute scaled residuals are not as large there. This part corresponds
to a tentative level shift of around 100, which is very far from the true one, and in such cases the fit may indeed be quite
different.
P. Rousseeuw et al. / Econometrics and Statistics 9 (2019) 108–121 115

Fig. 6. Airline data with contamination 3.

Table 1
Coefficient estimates, t-statistics and p-values for series P12119085_KE_GB
(columns 2–4) and P17049075_UA_LT (columns 5–7).

P12119085_KE_GB P17049075_UA_LT

Coeff t-stat p-values Coeff t-stat p-values

αˆ 0 115.27 25.6 0 55.14 14.3 0


αˆ 1 1.59 5.80 0 0.90 4.52 0
βˆ11 −2.83 −0.72 0.47 15.55 3.75 0.0 0 056
βˆ12 −12.42 −2.65 0.012 3.61 0.85 0.40
βˆ21 −9.07 −1.95 0.059 −32.50 −7.64 0
βˆ22 −22.60 −4.80 0 −16.06 −3.72 0.0 0 061
γˆ1 −0.016 −3.72 0.0 0 061 −0.023 −12.1 0
δˆ1 −112.62 −14.7 0 −79.41 −13.9 0

4. Analysis of trade data

Our main goal is to analyze the many short time series of trade described in Section 1. After trying several model
specifications in the class (2) we found that the best results were obtained by using a linear trend, two harmonics, and one
parameter to model the varying amplitude of the seasonal component, that is, A = 1, B = 2 and G = 1. Note that this yields
p = 9 parameters including the position and height of a potential level shift, which is not too many compared to the length
of the time series (T = 48). As an example we now apply our method to the time series in Fig. 1.
The robust fit to series P12119085-KE-GB (bottom panel of Fig. 7) suggests three moderate outliers in positions 1, 9 and
15. The fit closely matches the level shift which is therefore well captured. The double wedge plot in the top panel of
Fig. 7 has two wedges which point to a level shift position around 27–28. The local refinement step selects position t = 27.
Columns 2–4 of Table 1 show the coefficients of the final fit together with their t-statistics and p-values. Most coefficients
are significant, and in particular the t-statistic of the height of the level shift is quite large with |t | = 14.7 . This drop looks
anomalous because in the period considered, Kenya was the only country of the East African Community (EAC) paying high
European import duties on flowers and related products including CN 12119085. On the other hand, Kenya is the third
largest exporter of cut flowers in the world. One would therefore check for a simultaneous upward level shift in an EAC
country not paying import duties, which could point to a misdeclaration of origin.
Fig. 8 shows the results for the second series, P17049075_UA_LT. The double wedge plot indicates the presence of a
level shift around position 35. The local refinement yields the position t = 34. Interestingly, there is a reddish line right
116 P. Rousseeuw et al. / Econometrics and Statistics 9 (2019) 108–121

Fig. 7. P12119085_KE_GB: double wedge plot (top) and observed and fitted values (bottom).

before the level shift. This is due to an outlier in position 32 which gets a red cross in the bottom panel of the figure.
The double wedge plot also reveals a yellow strip at positions 29 and 30, indicating two less extreme outliers. Finally, the
plot also shows some small reddish areas that correspond to local irregularities, for instance observations 4, 5, 17 and 18
which are flagged as outliers in the bottom panel. Columns 5–7 of Table 1 list the coefficients of the final fit. Also here most
coefficients are strongly significant.
In this case the level shift might point to a different type of violation. The market of sugar and high-sugar-content
products, such as CN code 17049075, is very restricted and regulated. The EU applies country-specific quotas for these
products, with lower import duty for imports below the quota and a higher duty beyond this limit (tariff rate quotas).
Therefore, it would be in an exporter’s interest to circumvent the quota by mislabeling this product as a somewhat related
product that is not under surveillance. In this situation one would check for upward level shifts in related products from
the same country.
Note that the t-values and p-values provided by LTS can help select a model. Table 1 fits model (2) with A = 1 (linear
trend), B = 2 (two harmonics), and G = 1 (the amplitude varies linearly). If we increase A we find that a quadratic trend
is insignificant, and the same for increasing G. The t-values indicate that there is enough evidence for a 6-month seasonal
effect but are less clear on the question whether B should be increased further for these short time series. In any case the
detection of the level shift turns out to be stable as a function of B here.

5. Comparison with other methods

We now compare our results with those obtained by the nonparametric method introduced by Fried (2004) and Fried
and Gather (2007) for robust filtering of time series. For this we used the function robust.filter from the R package
robfilter of Fried et al. (2012). The robust fitting methods are applied to a moving time window of size width, which
needs to be an odd number.
Tables 2 and 3 report the position of the level shift(s) and outlier(s) detected with all default options and various choices
of window widths. We also tested different robust choices for the trend and scale estimation and some values for the adapt
option which adapts the moving window width, with similar results.
Fig. 9(a) and (b) show the resulting fits obtained by the nonparametric filter, for widths giving rise to the detection of
a level shift (width = 7 for P_12119085_KE_GB and width = 11 for P_17049075_UA_LT). In the first series we see that the
level shift is detected well for the appropriate width, but the fit itself is not as tight. Also in the second series a reasonable
level shift position is found but the fit is not that close to the series. This can be explained by the fact that a nonparametric
method has no prior knowledge about the data as it has to work on any data set, whereas our parametric model benefits
from knowledge about the typical behavior of trade time series. In that sense the comparison is not entirely fair.
P. Rousseeuw et al. / Econometrics and Statistics 9 (2019) 108–121 117

Fig. 8. P17049075_UA_LT: double wedge plot (top) and observed and fitted values (bottom).

Table 2
P_12119085_KE_GB: Positions of level shifts and outliers detected by a
nonparametric time series filter, using different window widths.

Window width Level shift position(s) Outlier position(s)

3 – [10, 16, 17, 37, 38, 46, 47]


5 – [16, 17, 18, 47]
7 [27] [17, 18]
9 [27, 37] –
11 [27] –

Table 3
P_17049075_UA_LT: Positions of level shifts and outliers detected by a nonparametric
time series filter, using different window widths.

Window width Level shift position(s) Outlier position(s)

3 – [2, 15, 44]


5 – [15, 16, 17, 28, 30, 31, 33, 39, 40, 41]
7 – [30, 31, 33, 34]
9 – [30, 31, 33, 34, 35]
11 [30] [32]

We also run the well-known X-13 ARIMA-SEATS method (Findley et al., 1998; U.S. Census Bureau, 2017) on both trade
time series, by means of the R package seasonal (Sax, 2017) which interfaces X-13. This method fits an ARIMA model with
a seasonal component. In additional to the coefficients required for the ARIMA model, X-13 has T additional parameters for
level shifts, one at each time point t = 1, . . . , T , plus T parameters for additive outliers (AO). Their coefficients are estimated
by stepwise regression, so most of them remain zero. For detecting isolated outliers, i.e. outliers surrounded by non-outlying
values, this approach works quite well.
Fig. 10 shows the X-13 fit to the trade series P12119085_KE_GB in the lower panel, with the LTS fit in the upper panel
for comparison. Fig. 11 does the same for P17049075_UA_LT. The blue curves are the time series, and the fits are in black. In
both cases X-13 does detect the level shift. It obtains the model (0 1 1) which only has a moving average and no seasonal
component. As a result its forecast (shown in red) has no seasonal component either. Note that in Fig. 10 the 90% tolerance
band around the forecast is much wider for X-13 than for LTS.
Let us now return to the airline data with contamination 1 described in Section 3. The results of LTS were shown in
Fig. 3. We now apply X-13 to it. The R-code and output are available in Section A.1 of the Supplementary Material. The
118 P. Rousseeuw et al. / Econometrics and Statistics 9 (2019) 108–121

Fig. 9. Fits obtained by a nonparametric time series filter for (a) P12119085_KE_GB; (b) P17049075_UA_LT .

Fig. 10. Trade series P12119085_KE_GB: time series (blue), fit (black) and forecast (red) obtained by LTS (top panel) and X-13 (bottom panel). (For inter-
pretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

model found by X-13 is ARIMA with (1 1 0)(0 1 0) whereas for the uncontaminated airline data it was (0 1 1)(0 1 1). In
this example the X-13 fit has 7 nonzero coefficients describing level shifts, and 1 nonzero coefficient for an AO outlier. The
bottom panel of Fig. 12 shows the time series and the X-13 fit which accommodates the outliers. On the other hand, the
LTS fit in the top panel follows the pattern of the majority of the data, so the outliers have large residuals from it. Also the
forecasts are quite different: those of LTS increase and have a narrow tolerance band, while those of X-13 slightly decrease
and have wide tolerance bands. For the uncontaminated airline data (that is, without outliers) the forecasts and tolerance
bands of LTS and X-13 were very similar.
Note that X-13 fits each set of consecutive outliers by a level shift at the start and a level shift afterward. That description
is indeed equivalent to the consecutive outliers representation. Our point is that accommodating the outliers gives a close
fit to the observed time series, but as we see here it can inflate the forecast band.
On the airline data without outliers, X-13 automatically log transforms the data before fitting it. On the airline data with
contamination it does not, because the time series contains at least one negative value. (Other transformations would be
possible, but in automatic mode X-13 only considers the log transform.) In this example the negative values were due to
outliers, but in fact many trade time series in the EU database have at least one zero value, correctly reflecting that a certain
product was not imported for a month, which will also prevent X-13 from transforming the data.
P. Rousseeuw et al. / Econometrics and Statistics 9 (2019) 108–121 119

Fig. 11. Trade series P17049075_UA_LT: time series (blue), fit (black) and forecast (red) obtained by LTS (top panel) and X-13 (bottom panel). (For interpre-
tation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 12. Airline data with contamination 1: time series (blue), fit (black) and forecast (red) obtained by LTS (top panel) and X-13 (bottom panel). (For
interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

To investigate this issue further, we looked at two ways to make the contaminated airline data positive. The first was to
add a constant so that the minimum of the contaminated time series becomes 1 (we also tried 5, 10 and 50). The second
was to truncate the series from below at 1 (or 5, 10, 50) so the downward outliers of contamination 1 remain visible.
However, in none of these cases did X-13 carry out a logarithmic transform, indicating that its transformation criterion was
affected by the outliers.
Also note that the outliers have a large magnitude in this example. In response to a referee request we also provide an
example with a level shift that is smaller than the seasonal component, in Section A.2 of the Supplementary Material.
120 P. Rousseeuw et al. / Econometrics and Statistics 9 (2019) 108–121

6. The case of several level shifts

Our basic model (2) only covers the situation where at most one level shift occurs, which is a reasonable assumption for
short time series. When several level shifts can occur, we first apply our approach to the original time series. If it detects a
level shift we can modify the time series by undoing the break, that is, subtract δˆ1 from all yt to the right of the level shift,
after which one can search for the next level shift, and so on. A detailed example of this procedure is shown in subsection
A.3 of the Supplementary Material.

7. Conclusions and outlook

We have introduced a new robust approach to model and monitor nonlinear time series with possible level shifts. A
fast algorithm was developed and applied to several real and artificial datasets. We also proposed a new graphical display,
the double wedge plot, which visualizes the possible presence of a level shift as well as outliers. This graph requires no
additional computation as it is an automatic by-product of the estimation. Our approach thus allows to automatically flag
outlying measurements and to detect a level shift, which is important in fraud detection as these may be indications of
unauthorized transactions. At the European Joint Research Centre, this methodology was validated by comparing its results
to those of visual inspection of many trade series by subject-matter experts.

Appendix

Here we prove that a C-step (as used in the first step of the NLTS algorithm) can only decrease the LTS objective function.
Let H(k) be the current h-subset with its corresponding nonlinear LS coefficients θˆ (k ) = ({αˆ (k ) }, {βˆ (k ) }, {γˆ (k ) }, δˆ1(k ) , δˆ2 ) and

objective function L(k ) = t∈H (k) r(2t ) (θˆ (k ) ).
Now consider H (k+1 ) , the h-subset which contains the h observations with smallest squared residual with respect to θˆ (k ) .
Then by construction
 
r(2t ) (θˆ (k ) )  r(2t ) (θˆ (k ) ) = L(k ) . (9)
t∈H (k+1) t∈H (k )

The ALS step A then yields θˆ (k+0.5 ) = ({αˆ (k+1 ) }, {βˆ (k ) }, {γˆ (k+1 ) }, δˆ1(k+1 ) , δˆ2 ). Since it is the LS solution of the linear
model (5),
 
r(2t ) (θˆ (k+0.5) )  r(2t ) (θˆ (k ) ) . (10)
t∈H (k+1) t∈H (k+1)

Next, ALS step B yields θˆ (k+1 ) = ({αˆ (k+1 ) }, {βˆ (k+1 ) }, {γˆ (k+1 ) }, δˆ1(k+1 ) , δˆ2 ) with
 
L(k+1) = r(2t ) (θˆ (k+1) )  r(2t ) (θˆ (k+0.5) ) . (11)
t∈H (k+1) t∈H (k+1)

Combining (9)–(11) yields


L(k+1)  L(k )
so the new h-subset H (k+1 ) has an objective function that is less than or equal to that of H(k) . Note that the only way to
obtain equality is if no coefficients have changed, in which case the iteration stops.

Supplementary material

Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.ecosta.2018.05.
001. The supplementary material to this paper contains some R code and worked-out examples.

References

Agostinelli, C., Leung, A., Yohai, V., Zamar, R., 2015. Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contam-
ination. Test 24, 441–461.
Barabesi, L., Cerasa, A., Perrotta, D., Cerioli, A., 2016. Modelling international trade data with the Tweedie distribution for anti-fraud and policy support.
Eur. J. Oper. Res. 258, 1031–1043.
Bianco, A.M., Garca Ben, M., Martnez, E.J., Yohai, V.J., 2001. Outlier detection in regression models with ARIMA errors using robust estimates. J. Forecast. 20
(8), 565–579.
Box, G.E.P., Jenkins, G.M., 1976. Time Series Analysis: Forecasting and Control. Holden Day.
Čížek, P., 2005. Least trimmed squares in nonlinear regression under dependence. J. Stat. Plan. Inference 136, 3967–3988.
Čížek, P., 2008. General trimmed estimation: robust approach to nonlinear and limited dependent variable models. Econom. Theory 24 (6), 1500–1529.
Croux, C., Rousseeuw, P.J., 1992. A class of high-breakdown scale estimators based on subranges. Commun. Stat. - Theory Methods 21, 1935–1951.
Findley, D.F., Monsell, B.C., Bell, W.R., Otto, M.C., Chen, B.C., 1998. New capabilities of the X-12-ARIMA seasonal adjustment program. J. Bus. Econ. Stat. 16,
127–177.
Fried, R., 2004. Robust filtering of time series with trends. J. Nonparametric Stat. 16 (3–4), 313–328.
P. Rousseeuw et al. / Econometrics and Statistics 9 (2019) 108–121 121

Fried, R., Gather, U., 2007. On rank tests for shift detection in time series. Comput. Stat. Data Anal. 52 (1), 221–233.
Fried, R., Schettlinger, K., & Borowski, M. (2012). Package ‘robfilter’. CRAN, URL https://CRAN.R-project.org/package=robfilter.
Galeano, P., Peña, D., 2013. Finding outliers and unexpected events in linear and nonlinear possible multivariate time series data. In: Becker, C., Fried, R.,
Kuhnt, S. (Eds.), Robustness and Complex Data Structures. Festschrift in Honour of Ursula Gather. Springer, pp. 255–273.
Gervini, D., Yohai, V.J., 2002. A class of robust and fully efficient regression estimators. Ann. Stat. 30, 583–616.
Pison, G., Van Aelst, S., Willems, G., 2002. Small sample corrections for LTS and MCD. Metrika 55, 111–123.
Rousseeuw, P., 1984. Least median of squares regression. J. Am. Stat. Assoc. 79, 871–880.
Rousseeuw, P., Hubert, M., 2018. Anomaly detection by robust statistics. WIREs Data Min. Knowl. Discov. 8 (2), e1236.
Rousseeuw, P., Leroy, A., 1987. Robust Regression and Outlier Detection. Wiley-Interscience, New York.
Rousseeuw, P., Van Driessen, K., 2006. Computing LTS regression for large data sets. Data Min. Knowl. Discov. 12, 29–45.
Salini, S., Cerioli, A., Laurini, F., Riani, M., 2015. Reliable robust regression diagnostics. Int. Stat. Rev. 84, 99–127.
Sax, C. (2017). Package ‘seasonal’: R interface to X-13-ARIMA-SEATS. CRAN, URL https://CRAN.R-project.org/package=seasonal.
Stromberg, A., 1993. Computation of high breakdown nonlinear regression parameters. J. Am. Stat. Assoc. 88, 237–244.
Stromberg, A., Ruppert, D., 1992. Breakdown in nonlinear regression. J. Am. Stat. Assoc. 87, 991–997.
U.S. Census Bureau (2017). X-13 ARIMA-SEATS Reference Manual, Version 1.1. URL http://www.census.gov/srd/www/x13as.

You might also like