Abstract
The previous decade has brought a remarkable increase of the interest in applications that deal with querying and mining of time series data. Many of the research efforts in this context have focused on introducing new representation methods for dimensionality reduction or novel similarity measures for the underlying data. In the vast majority of cases, each individual work introducing a particular method has made specific claims and, aside from the occasional theoretical justifications, provided quantitative experimental observations. However, for the most part, the comparative aspects of these experiments were too narrowly focused on demonstrating the benefits of the proposed methods over some of the previously introduced ones. In order to provide a comprehensive validation, we conducted an extensive experimental study re-implementing eight different time series representations and nine similarity measures and their variants, and testing their effectiveness on 38 time series data sets from a wide variety of application domains. In this article, we give an overview of these different techniques and present our comparative experimental findings regarding their effectiveness. In addition to providing a unified validation of some of the existing achievements, our experiments also indicate that, in some cases, certain claims in the literature may be unduly optimistic.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aßfalg J, Kriegel H-P, Kröger P, Kunath P, Pryakhin A, Renz M (2006) Similarity search on time series based on threshold queries. In: EDBT
Aßfalg J, Kriegel H-P, Kroger P, Kunath P, Pryakhin A, Renz M (2008) Similarity search in multimedia time series data using amplitude-level features. In: MMM’08, pp 123–133
Additional experiment results for representation and similarity measures of time series. http://www.ece.northwestern.edu/~hdi117/tsim.htm
Alon J, Athitsos V, Sclaroff S (2005) Online and offline character recognition using alignment to prototypes. In: ICDAR’05, pp 839–845
André-Jönsson H, Badal DZ (1997) Using signature files for querying time-series data. In: PKDD
Assent I, Wichterich M, Krieger R, Kremer H, Seidl T (2009) Anticipatory dtw for efficient similarity search in time series databases. PVLDB 2(1): 826–837
Bennet B, Galton A (2004) A unifying semantics for time and events. Artif Intell 153(1–2): 13–48
Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: KDD workshop, pp 359–370
Cai Y, Ng RT (2004) Indexing spatio-temporal trajectories with chebyshev polynomials. In: SIGMOD conference
Cardle M (2004) Automated motion EDITInG. In: Technical report, Computer Laboratory, University of Cambridge, Cambridge
Chan K-p, Fu AW-C (1999) Efficient time series matching by wavelets. In: ICDE
Chen L, Ng RT (2004) On the marriage of lp-norms and edit distance. In: VLDB
Chen L, Özsu MT, Oria V (2005a) Robust and fast similarity search for moving object trajectories. In: SIGMOD conference
Chen L, Özsu MT, Oria V (2005b) Using multi-scale histograms to answer pattern existence and shape match queries. In: SSDBM
Chen Q, Chen L, Lian X, Liu Y, Yu JX (2007a) Indexable PLA for efficient similarity search. In: VLDB
Chen Y, Nascimento MA, Ooi BC, Tung AKH (2007b) SpADe: On shape-based pattern detection in streaming time series. In: ICDE
Duda RO, Hart PE (1973) Pattern classification and scene analysis. Wiley, New York
Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. In: SIGMOD conference
Flato E (2000) Robust and efficient computation of planar minkowski sums. Master’s thesis, School of Exact Sciences, Tel-Aviv University
Frentzos E, Gratsias K, Theodoridis Y (2007) Index-based most similar trajectory search. In: ICDE
Geurts P (2001) Pattern extraction for time series classification. In: PKDD
Geurts P (2002) Contributions to decision tree induction: bias/variance tradeoff and time series classification. PhD thesis, University of Liège, Belgium
Jia S, Qian Y, Dai G (2004) An advanced segmental semi-markov model based online series pattern detection. In: ICPR (3)’04, pp 634–637
Jiawei H, Kamber M (2005) Data mining: concepts and techniques. Morgan Kaufmann Publishers, California
Karamitopoulos L, Evangelidis G (2009) A dispersion-based paa representation for time series. In: CSIE (4), pp 490–494
Karydis I, Nanopoulos A, Papadopoulos AN, Manolopoulos Y (2005) Evaluation of similarity searching methods for music data in P2P networks. IJBIDM 1(2)
Kawagoe K, Ueda T (2002) a similarity search method of time series data with combination of Fourier and wavelet transforms. In: TIME
Keogh EJ (2002) Exact indexing of dynamic time warping. In: VLDB
Keogh EJ (2006) A decade of progress in indexing and mining large time series databases. In: VLDB
Keogh EJ, Kasetty S (2003) On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Min Knowl Discov 7(4): 349–371
Keogh EJ, Ratanamahatana CA (2005) Exact indexing of dynamic time warping. Knowl Inf Syst 7(3): 358–386
Keogh EJ, Chakrabarti K, Mehrotra S, Pazzani MJ (2001a) Locally adaptive dimensionality reduction for indexing large time series databases. In: SIGMOD conference, pp 151–162
Keogh EJ, Chakrabarti K, Pazzani MJ, Mehrotra S (2001b) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(3): 263–286
Keogh E, Xi X, Wei L, Ratanamahatana C (2006) The UCR time series dataset. http://www.cs.ucr.edu/~eamonn/time_series_data/
Keogh EJ, Wei L, Xi X, Vlachos M, Lee S-H, Protopapas P (2009) Supporting exact indexing of arbitrarily rotated shapes and periodic time series under euclidean and warping distance measures. VLDB J 18(3):611–630
Kim S-W, Park S, Chu WW (2001) An index-based approach for similarity search supporting time warping in large sequence databases. In: ICDE
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI
Korn F, Jagadish HV, Faloutsos C (1997) Efficiently supporting ad hoc queries in large datasets of time sequences. In: SIGMOD conference
Kumar A, Jawahar CV, Manmatha R (2007) Efficient search in document image collections. In: ACCV (1)’07, pp 586–595
Lemire D (2009) Faster retrieval with a two-pass dynamic-time-warping lower bound. Pattern recognition, pp 2169–2180
Lin Y (2006) Efficient human motion retrieval in large databases. In: GRAPHITE, pp 31–37
Lin J, Keogh EJ, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Discov 15(2): 107–144
Morse MD, Patel JM (2007) An efficient and accurate method for evaluating time series similarity. In: SIGMOD conference
Ng RT (2006) Note of caution. http://www.cs.ubc.ca/~rng/psdepository/chebyReport2.pdf
Olofsson P (2005) Probability, statistics and stochastic processes. Wiley-Interscience, Hoboken
Papadopoulos AN (2008) Trajectory retrieval with latent semantic analysis. In: SAC’08, pp 1089–1094
Park S, Kim S-W (2006) Prefix-querying with an 11 distance metric for time-series subsequence matching under time warping
Popivanov I, Miller RJ (2002) Similarity search over time-series data using wavelets. In: ICDE
Ratanamahatana CA, Keogh EJ (2005) Three myths about dynamic time warping data mining. In: SDM
Sakurai Y, Yoshikawa M, Faloutsos C (2005) Ftw: fast similarity search under the time warping distance. In: PODS’05, pp 326–337
Salzberg S (1997) On comparing classifiers: pitfalls to avoid and a recommended approach. Data Min Knowl Discov 1(3): 317–328
Steinbach M, Tan P-N, Kumar V, Klooster SA, Potter C (2003) Discovery of climate indices using clustering. In: KDD
Tan P-N, Steinbach M, Kumar V (2005) Introduction to data mining. Addison-Wesley, Reading
Tansel A, Clifford J, Jajodia S, Segev A, Snodgrass R (1993) Temporal databases: theory and implementation. Benjamin/Cummings Publishing Co., Menlo Park
Vlachos M, Gunopulos D, Kollios G (2002) Discovering similar multidimensional trajectories. In: ICDE, pp 673–684
Vlachos M, Hadjieleftheriou M, Gunopulos D, Keogh EJ (2006) Indexing multidimensional time-series. VLDB J 15(1): 1–20
Workshop and challenge on time series classification at SIGKDD (2007). http://www.cs.ucr.edu/~eamonn/SIGKDD2007TimeSeries.html
Wu Y-L, Agrawal D, Abbadi AE (2000) A comparison of DFT and DWT based similarity search in time-series databases. In: CIKM
Xi X, Keogh EJ, Shelton CR, Wei L, Ratanamahatana CA (2006) Fast time series classification using numerosity reduction. In: ICML
Yi B-K, Faloutsos C (2000) Fast time sequence indexing for arbitrary Lp norms. In: VLDB
Yi B-K, Jagadish HV, Faloutsos C (1998) Efficient retrieval of similar time sequences under time warping. In: ICDE. IEEE Computer Society
Zhang G HB, Kinsner W (2009) Electrocardiogram data mining based on frame classification by dynamic time warping matching. Comput Methods Biomech Biomed Eng
Zhou M, Wong MH (2007) Boundary-based lower-bound functions for dynamic time warping and their indexing. In: ICDE’07, pp 1307–1311
Zhu Y, Shasha D (2003) Warping indexes with envelope transforms for query by humming. In: SIGMOD conference
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Geoffrey I. Webb.
Research supported by NSF awards 0803410 and 0808770, NSF-CNS grant 0910952.
Rights and permissions
About this article
Cite this article
Wang, X., Mueen, A., Ding, H. et al. Experimental comparison of representation methods and distance measures for time series data. Data Min Knowl Disc 26, 275–309 (2013). https://doi.org/10.1007/s10618-012-0250-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-012-0250-5