Abstract
Density-based clustering is able to find clusters of arbitrary sizes and shapes while effectively separating noise. Despite its advantage over other types of clustering, it is well-known that most density-based algorithms face the same challenge of finding clusters with varied densities. Recently, ReScale, a principled density-ratio preprocessing technique, enables a density-based clustering algorithm to identify clusters with varied densities. However, because the technique is based on one-dimensional scaling, it does not do well in datasets which require multi-dimensional scaling. In this paper, we propose a multi-dimensional scaling method, named DScale, which rescales based on the computed distance. It overcomes the key weakness of ReScale and requires one less parameter while maintaining the simplicity of the implementation. Our empirical evaluation shows that DScale has better clustering performance than ReScale for three existing density-based algorithms, i.e., DBSCAN, OPTICS and DP, on synthetic and real-world datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications. Chapman and Hall/CRC Press, Boca Raton (2013)
Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: OPTICS: ordering points to identify the clustering structure. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data. SIGMOD 1999, pp. 49–60. ACM, New York (1999)
Borg, I., Groenen, P.J., Mair, P.: Applied Multidimensional Scaling. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31848-1
Ertöz, L., Steinbach, M., Kumar, V.: Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of the 2003 SIAM International Conference on Data Mining, pp. 47–58. SIAM (2003)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), pp. 226–231. AAAI Press (1996)
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco (2011)
Hinneburg, A., Gabriel, H.-H.: DENCLUE 2.0: fast clustering based on kernel density estimation. In: R. Berthold, M., Shawe-Taylor, J., Lavrač, N. (eds.) IDA 2007. LNCS, vol. 4723, pp. 70–80. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74825-0_7
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, Hoboken (1990)
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Pearson, E.S.: The probability integral transformation for testing goodness of fit and combining independent tests of significance. Biometrika 30(1/2), 134–148 (1938)
Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)
Zhu, Y., Ting, K.M., Carman, M.J.: Density-ratio based clustering for discovering clusters with varying densities. Pattern Recogn. 60, 983–997 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Zhu, Y., Ting, K.M., Angelova, M. (2018). A Distance Scaling Method to Improve Density-Based Clustering. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10939. Springer, Cham. https://doi.org/10.1007/978-3-319-93040-4_31
Download citation
DOI: https://doi.org/10.1007/978-3-319-93040-4_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93039-8
Online ISBN: 978-3-319-93040-4
eBook Packages: Computer ScienceComputer Science (R0)