[go: up one dir, main page]

Skip to main content
Log in

Experimental comparison of representation methods and distance measures for time series data

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

The previous decade has brought a remarkable increase of the interest in applications that deal with querying and mining of time series data. Many of the research efforts in this context have focused on introducing new representation methods for dimensionality reduction or novel similarity measures for the underlying data. In the vast majority of cases, each individual work introducing a particular method has made specific claims and, aside from the occasional theoretical justifications, provided quantitative experimental observations. However, for the most part, the comparative aspects of these experiments were too narrowly focused on demonstrating the benefits of the proposed methods over some of the previously introduced ones. In order to provide a comprehensive validation, we conducted an extensive experimental study re-implementing eight different time series representations and nine similarity measures and their variants, and testing their effectiveness on 38 time series data sets from a wide variety of application domains. In this article, we give an overview of these different techniques and present our comparative experimental findings regarding their effectiveness. In addition to providing a unified validation of some of the existing achievements, our experiments also indicate that, in some cases, certain claims in the literature may be unduly optimistic.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Aßfalg J, Kriegel H-P, Kröger P, Kunath P, Pryakhin A, Renz M (2006) Similarity search on time series based on threshold queries. In: EDBT

  • Aßfalg J, Kriegel H-P, Kroger P, Kunath P, Pryakhin A, Renz M (2008) Similarity search in multimedia time series data using amplitude-level features. In: MMM’08, pp 123–133

  • Additional experiment results for representation and similarity measures of time series. http://www.ece.northwestern.edu/~hdi117/tsim.htm

  • Alon J, Athitsos V, Sclaroff S (2005) Online and offline character recognition using alignment to prototypes. In: ICDAR’05, pp 839–845

  • André-Jönsson H, Badal DZ (1997) Using signature files for querying time-series data. In: PKDD

  • Assent I, Wichterich M, Krieger R, Kremer H, Seidl T (2009) Anticipatory dtw for efficient similarity search in time series databases. PVLDB 2(1): 826–837

    Google Scholar 

  • Bennet B, Galton A (2004) A unifying semantics for time and events. Artif Intell 153(1–2): 13–48

    Article  Google Scholar 

  • Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: KDD workshop, pp 359–370

  • Cai Y, Ng RT (2004) Indexing spatio-temporal trajectories with chebyshev polynomials. In: SIGMOD conference

  • Cardle M (2004) Automated motion EDITInG. In: Technical report, Computer Laboratory, University of Cambridge, Cambridge

  • Chan K-p, Fu AW-C (1999) Efficient time series matching by wavelets. In: ICDE

  • Chen L, Ng RT (2004) On the marriage of lp-norms and edit distance. In: VLDB

  • Chen L, Özsu MT, Oria V (2005a) Robust and fast similarity search for moving object trajectories. In: SIGMOD conference

  • Chen L, Özsu MT, Oria V (2005b) Using multi-scale histograms to answer pattern existence and shape match queries. In: SSDBM

  • Chen Q, Chen L, Lian X, Liu Y, Yu JX (2007a) Indexable PLA for efficient similarity search. In: VLDB

  • Chen Y, Nascimento MA, Ooi BC, Tung AKH (2007b) SpADe: On shape-based pattern detection in streaming time series. In: ICDE

  • Duda RO, Hart PE (1973) Pattern classification and scene analysis. Wiley, New York

    MATH  Google Scholar 

  • Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. In: SIGMOD conference

  • Flato E (2000) Robust and efficient computation of planar minkowski sums. Master’s thesis, School of Exact Sciences, Tel-Aviv University

  • Frentzos E, Gratsias K, Theodoridis Y (2007) Index-based most similar trajectory search. In: ICDE

  • Geurts P (2001) Pattern extraction for time series classification. In: PKDD

  • Geurts P (2002) Contributions to decision tree induction: bias/variance tradeoff and time series classification. PhD thesis, University of Liège, Belgium

  • Jia S, Qian Y, Dai G (2004) An advanced segmental semi-markov model based online series pattern detection. In: ICPR (3)’04, pp 634–637

  • Jiawei H, Kamber M (2005) Data mining: concepts and techniques. Morgan Kaufmann Publishers, California

    Google Scholar 

  • Karamitopoulos L, Evangelidis G (2009) A dispersion-based paa representation for time series. In: CSIE (4), pp 490–494

  • Karydis I, Nanopoulos A, Papadopoulos AN, Manolopoulos Y (2005) Evaluation of similarity searching methods for music data in P2P networks. IJBIDM 1(2)

  • Kawagoe K, Ueda T (2002) a similarity search method of time series data with combination of Fourier and wavelet transforms. In: TIME

  • Keogh EJ (2002) Exact indexing of dynamic time warping. In: VLDB

  • Keogh EJ (2006) A decade of progress in indexing and mining large time series databases. In: VLDB

  • Keogh EJ, Kasetty S (2003) On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Min Knowl Discov 7(4): 349–371

    Article  MathSciNet  Google Scholar 

  • Keogh EJ, Ratanamahatana CA (2005) Exact indexing of dynamic time warping. Knowl Inf Syst 7(3): 358–386

    Article  Google Scholar 

  • Keogh EJ, Chakrabarti K, Mehrotra S, Pazzani MJ (2001a) Locally adaptive dimensionality reduction for indexing large time series databases. In: SIGMOD conference, pp 151–162

  • Keogh EJ, Chakrabarti K, Pazzani MJ, Mehrotra S (2001b) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(3): 263–286

    Article  MATH  Google Scholar 

  • Keogh E, Xi X, Wei L, Ratanamahatana C (2006) The UCR time series dataset. http://www.cs.ucr.edu/~eamonn/time_series_data/

  • Keogh EJ, Wei L, Xi X, Vlachos M, Lee S-H, Protopapas P (2009) Supporting exact indexing of arbitrarily rotated shapes and periodic time series under euclidean and warping distance measures. VLDB J 18(3):611–630

    Google Scholar 

  • Kim S-W, Park S, Chu WW (2001) An index-based approach for similarity search supporting time warping in large sequence databases. In: ICDE

  • Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI

  • Korn F, Jagadish HV, Faloutsos C (1997) Efficiently supporting ad hoc queries in large datasets of time sequences. In: SIGMOD conference

  • Kumar A, Jawahar CV, Manmatha R (2007) Efficient search in document image collections. In: ACCV (1)’07, pp 586–595

  • Lemire D (2009) Faster retrieval with a two-pass dynamic-time-warping lower bound. Pattern recognition, pp 2169–2180

  • Lin Y (2006) Efficient human motion retrieval in large databases. In: GRAPHITE, pp 31–37

  • Lin J, Keogh EJ, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Discov 15(2): 107–144

    Article  MathSciNet  Google Scholar 

  • Morse MD, Patel JM (2007) An efficient and accurate method for evaluating time series similarity. In: SIGMOD conference

  • Ng RT (2006) Note of caution. http://www.cs.ubc.ca/~rng/psdepository/chebyReport2.pdf

  • Olofsson P (2005) Probability, statistics and stochastic processes. Wiley-Interscience, Hoboken

    Book  MATH  Google Scholar 

  • Papadopoulos AN (2008) Trajectory retrieval with latent semantic analysis. In: SAC’08, pp 1089–1094

  • Park S, Kim S-W (2006) Prefix-querying with an 11 distance metric for time-series subsequence matching under time warping

  • Popivanov I, Miller RJ (2002) Similarity search over time-series data using wavelets. In: ICDE

  • Ratanamahatana CA, Keogh EJ (2005) Three myths about dynamic time warping data mining. In: SDM

  • Sakurai Y, Yoshikawa M, Faloutsos C (2005) Ftw: fast similarity search under the time warping distance. In: PODS’05, pp 326–337

  • Salzberg S (1997) On comparing classifiers: pitfalls to avoid and a recommended approach. Data Min Knowl Discov 1(3): 317–328

    Article  Google Scholar 

  • Steinbach M, Tan P-N, Kumar V, Klooster SA, Potter C (2003) Discovery of climate indices using clustering. In: KDD

  • Tan P-N, Steinbach M, Kumar V (2005) Introduction to data mining. Addison-Wesley, Reading

    Google Scholar 

  • Tansel A, Clifford J, Jajodia S, Segev A, Snodgrass R (1993) Temporal databases: theory and implementation. Benjamin/Cummings Publishing Co., Menlo Park

    Google Scholar 

  • Vlachos M, Gunopulos D, Kollios G (2002) Discovering similar multidimensional trajectories. In: ICDE, pp 673–684

  • Vlachos M, Hadjieleftheriou M, Gunopulos D, Keogh EJ (2006) Indexing multidimensional time-series. VLDB J 15(1): 1–20

    Article  Google Scholar 

  • Workshop and challenge on time series classification at SIGKDD (2007). http://www.cs.ucr.edu/~eamonn/SIGKDD2007TimeSeries.html

  • Wu Y-L, Agrawal D, Abbadi AE (2000) A comparison of DFT and DWT based similarity search in time-series databases. In: CIKM

  • Xi X, Keogh EJ, Shelton CR, Wei L, Ratanamahatana CA (2006) Fast time series classification using numerosity reduction. In: ICML

  • Yi B-K, Faloutsos C (2000) Fast time sequence indexing for arbitrary Lp norms. In: VLDB

  • Yi B-K, Jagadish HV, Faloutsos C (1998) Efficient retrieval of similar time sequences under time warping. In: ICDE. IEEE Computer Society

  • Zhang G HB, Kinsner W (2009) Electrocardiogram data mining based on frame classification by dynamic time warping matching. Comput Methods Biomech Biomed Eng

  • Zhou M, Wong MH (2007) Boundary-based lower-bound functions for dynamic time warping and their indexing. In: ICDE’07, pp 1307–1311

  • Zhu Y, Shasha D (2003) Warping indexes with envelope transforms for query by humming. In: SIGMOD conference

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoyue Wang.

Additional information

Responsible editor: Geoffrey I. Webb.

Research supported by NSF awards 0803410 and 0808770, NSF-CNS grant 0910952.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, X., Mueen, A., Ding, H. et al. Experimental comparison of representation methods and distance measures for time series data. Data Min Knowl Disc 26, 275–309 (2013). https://doi.org/10.1007/s10618-012-0250-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-012-0250-5

Keywords

Navigation