RADDACL2: a recursive approach to discovering density clusters

Daniel Avila¹ &
Iren Valova¹

110 Accesses
Explore all metrics

Abstract

Discovering connected regions in data space is a complex problem that is extremely demanding on the user. Datasets often require preprocessing and postprocessing before they are fit for algorithm and user consumption. Existing clustering algorithms require performance parameters to achieve adequate results. Typically, these parameters are either empirically optimized or scanned using brute force, which ultimately adds additional burden to the user. We present RADDACL2, a density-based clustering algorithm, with the intent of reducing overall user burden. The algorithm requires no information other than the dataset to identify clusters. In addition, the algorithm is deterministic, meaning the results will always be the same. Both of these features reduce user burden by decreasing the number of passes one must make to get an outcome. A number of experiments are performed using toy and real datasets to verify the capabilities of RADDACL2 as compared to existing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A New Density Clustering Method Using Mutual Nearest Neighbor

A dynamic density-based clustering method based on K-nearest neighbor

Article 27 January 2024

VDENCLUE: An Enhanced Variant of DENCLUE Algorithm

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Abbreviations

Centroid [${C}_{\mathrm{X}}$]:: The centroid of the current region. The value of X associates the centroid with a specific region
Observation [O]:: A series of attributes representing a single observation in the dataset
Region [${R}_{\mathrm{X}}$]:: A region represents a subdivision of the dataset. These are created during Density Discovery
Precluster [PC]:: A set of related observations that contain a very density. They are identified during Density Discovery
Average Absolute Deviation [AAD]:: An average of the absolute deviations from a measure of central tendency
Standard Deviation [STD]:: The variability of data from a measure of central tendency
Neighborhood Function [NF]:: An algorithm that is used to identify if a pair of preclusters are neighbors
Neighborhood Radius [NR]:: A threshold assigned to each precluster. These are used by the neighborhood function to determine if two preclusters should be merged

References

Grabmeier, J., Rudolph, A.: Techniques of cluster algorithms in data mining. Data Min. Knowl. Discov. 6(4), 303–360 (2002)
Article MathSciNet Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Article Google Scholar
Jain, A.K., Duin, R.P.W., Mao, J.: Statistical pattern recognition: a review. Pattern Anal. Mach. Intell. IEEE Trans. 22(1), 4–37 (2000)
Article Google Scholar
Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recognit. Lett. 31(8), 651–666 (2010)
Article Google Scholar
Bunke, H., Riesen, K.: Towards the unification of structural and statistical pattern recognition. Pattern Recognit. Lett. 33(7), 811–825 (2012)
Article Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.J.H.: The Elements of Statistical Learning, vol. 1. Springer, New York (2001)
Book Google Scholar
Curtin, R.R., Cline, J.R., Slagle, N.P., Amidon, M.L., Gray, A.G.: MLPACK: a scalable C++ machine learning library. J. Mach. Learn. Res. 14, 801–805 (2013)
MATH MathSciNet Google Scholar
Chi, Y.: Multivariate methods. Wiley Interdiscip. Rev. Comput. Stat. 4(1), 35–47 (2012)
Article Google Scholar
Berkhin, P.: A survey of clustering data mining techniques. In: Kogan, J., Nicholas, C., Teboulle, M. (eds.) Grouping Multidimensional Data, pp. 25–71. Springer, Berlin (2006)
Chapter Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Article Google Scholar
Plaza, A., Benediktsson, J.A., Boardman, J.W., Brazile, J., Bruzzone, L., Camps-Valls, G., Chanussot, J., Fauvel, M., Gamba, P., Gualtieri, A., et al.: Recent advances in techniques for hyperspectral image processing. Remote Sens. Environ. 113, S110–S122 (2009)
Article Google Scholar
Liming, X., Yanchao, Z.: Automated strawberry grading system based on image processing. Comput. Electron. Agric. 71, S32–S39 (2010)
Article Google Scholar
Bécue-Bertaut, M., Pagès, J.: Multiple factor analysis and clustering of a mixture of quantitative, categorical and frequency data. Comput. Stat. Data Anal. 52(6), 3255–3268 (2008)
Article MATH Google Scholar
Chan, G., Gelernter, J., Oslin, D., Farrer, L., Kranzler, H.R.: Empirically derived subtypes of opioid use and related behaviors. Addiction 106(6), 1146–1154 (2011)
Article Google Scholar
Gottardo, F., Liu, C.G., Ferracin, M., Calin, G.A., Fassan, M., Fassan, M., Bassi, P., Sevignani, C., Byrne, D., Negrini, M., Pagano, F., Gomella, L.G., Croce, C.M., Baffa, R.: Micro-RNA profiling in kidney and bladder cancers. Urol. Oncol. 25(5), 387–392 (2007)
Article Google Scholar
Stahlberg, A., Elbing, K., Andrade-Garda, J., Sjogreen, B., Forootan, A., Kubista, M.: Multiway real-time PCR gene expression profiling in yeast Saccharomyces cerevisiae reveals altered transcriptional response of ADH-genes to glucose stimuli. BMC Genomics 9(1), 170 (2008)
Tichopád, A., Pecen, L., Pfaffl, M.W.: Distribution-insensitive cluster analysis in SAS on real-time PCR gene expression data of steadily expressed genes. Comput. Methods Programs Biomed. 82(1), 44–50 (2006)
Article Google Scholar
Estivill-Castro, V.: Why so many clustering algorithms: a position paper. SIGKDD Explor. Newsl. 4(1), 65–75 (2002)
Article MathSciNet Google Scholar
Beaton, D., Valova, I.: RADDACL: a recursive algorithm for clustering and density discovery on non-linearly separable data. In: Neural Networks, 2007. IJCNN 2007. International Joint Conference on 2007, pp. 1633–1638
Dunham, M.H.: Data Mining: Introductory and Advanced Topics. Pearson Education India, Delhi (2006)
Google Scholar
Rizzi, R., Mahata, P., Mathieson, L., Moscato, P.: Hierarchical clustering using the arithmetic-harmonic cut: complexity and experiments. PLoS ONE 5(12), e14067 (2010)
Article Google Scholar
Sileshi, M., Gamback, B.: Evaluating clustering algorithms: cluster quality and feature selection in content-based image clustering. 2009 WRI World Congress on Computer Science and Information Engineering, vol. 6, pp. 435, 441, March 31 2009–April 2 2009
Malik, H.H., Kender, J.R., Fradkin, D., Moerchen, F.: Hierarchical document clustering using local patterns. Data Min. Knowl. Discov. 21(1), 153–185 (2010)
Article MathSciNet Google Scholar
Xiong, T., Wang, S., Mayers, A., Monga, E.: DHCC: divisive hierarchical clustering of categorical data. Data Min. Knowl. Discov. 24(1), 103–135 (2012)
Article MATH MathSciNet Google Scholar
Dhillon, I.S., Guan, Y., Kulis, B.: Kernel k-means, spectral clustering and normalized cuts. KDD ’04 Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 551–556 (2004)
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035, (2007)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis, vol. 344. Wiley.com, New York (2009)
Google Scholar
Kaufman, L., Rousseeuw, P.J.: Clustering my means of Medoids. Statistical Data Analysis Based on the L$\_$1–Norm and Related Methods (2002)
Mahdavi, M., Abolhassani, H.: Harmony K-means algorithm for document clustering. Data Min. Knowl. Discov. 18(3), 370–391 (2009)
Article MathSciNet Google Scholar
Bellaachia, A., Bari, A.: Flock by leader: a novel machine learning biologically inspired clustering algorithm. In: Tan, Y., Shi, Y., Ji, Z. (eds.) Advances in Swarm Intelligence, pp. 117–126. Springer, Berlin (2012)
Chapter Google Scholar
Kohonen, T.: Self-Organizing Maps. Springer, Berlin (2001)
Book MATH Google Scholar
Bação, F., Lobo, V., Painho, M.: Self-organizing maps as substitutes for k-means clustering. In: Sunderam, V.S., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) Computational Science—ICCS 2005, pp. 476–483. Springer, Berlin (2005)
Chapter Google Scholar
Linde, Y., Buzo, A., Gray, R.: An algorithm for vector quantizer design. Commun. IEEE Trans. 28(1), 84–95 (1980)
Article Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39(1), 1–31 (1977)
Sander, J., Ester, M., Kriegel, H.-P., Xu, X.: Density-based clustering in spatial databases: the algorithm gdbscan and its applications. Data Min. Knowl. Discov. 2(2), 169–194 (1998)
Ankerst, M., Breunig, M.M., Kriegel, H.-P., Sander, J.: OPTICS: ordering points to identify the clustering structure. ACM SIGMOD Rec. 28(2), 49–60 (1999)
Article Google Scholar
Achtert, E., Böhm, C., Kröger, P.: DeLi-Clu: boosting robustness, completeness, usability, and efficiency of hierarchical clustering by a closest pair ranking. In: Advances in Knowledge Discovery and Data Mining, pp. 119–128. Springer, Berlin (2006)
Hinneburg, A., Gabriel, H.H.: DENCLUE 2.0: fast clustering based on kernel density estimation. In: Berthold, M.R., Shawe-Taylor, J., Lavrač, N. (eds.) Advances in Intelligent Data Analysis VII, pp. 70–80. Springer, Berlin (2007)
Chapter Google Scholar
Bae, E., Bailey, J., Dong, G.: A clustering comparison measure using density profiles and its application to the discovery of alternate clusterings. Data Min. Knowl. Discov. 21(3), 427–471 (2010)
Article MathSciNet Google Scholar
Dhillon, I.S., Guan, Y., Kulis, B.: Kernel k-means: spectral clustering and normalized cuts. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 551–556 (2004)
Achtert, E., Kriegel, H.-P., Zimek, A.: ELKI: a software system for evaluation of subspace clustering algorithms. In: Scientific and Statistical Database Management, pp. 580–585 (2008)
Miller, I., Miller, M., John, E.: Freund’s mathematical statics with applications. Pearson Prentice Hall, Upper Sadle River (2004)
Samaria, F.S., Harter, A.C.: Parameterisation of a stochastic model for human face identification. In: Proceedings of the Second IEEE Workshop on Applications of Computer Vision, pp. 138–142 (1994)
Marcus, D.S., Wang, T.H., Parker, J., Csernansky, J.G., Morris, J.C., Buckner, R.L.: Open access series of imaging studies (OASIS): cross-sectional mri data in young, middle aged, nondemented, and demented older adults. J. Cogn. Neurosci. 19, 1498–150 (2007). http://www.oasis-brains.org/app/template/Index.vm
Hashioka, A., Kobashi, S., Kuramoto, K., Wakata, Y., Ando, K., Ishikura, R., Ishikawa, T., Hirota, S., Hata, Y.: Shape and appearance knowledge based brain segmentation for neonatal MR images. World Automation Congress (WAC), 2012, pp. 1–6, 24–28 June (2012)
Selvaraj, D., Dhanasekaran, R.: Novel approach for segmentation of brain magnetic resonance imaging using intensity based thresholding. IEEE International Conference on Communication Control and Computing Technologies (ICCCCT), pp. 502–507, 7–9 Oct. (2010)
Lee, S.H., Kim, J.H., Kim, K.G., Park, J.S., Park, S.J., Moon, W.K.: Optimal clustering of kinetic patterns on malignant breast lesions: comparison between k-means clustering and three-time-points method in dynamic contrast-enhanced MRI. EMBS 2007. 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 2089–2093, 22–26 Aug. (2007)
Zhang, M.-L., Zhou, Z.-H.: Exploiting unlabeled data to enhance ensemble diversity. Data Min. Knowl. Discov. 26(1), 98–129 (2013)
Article MATH MathSciNet Google Scholar

Download references

Acknowledgments

The authors wish to thank Mr. Derek Beaton for his comments and suggestions in developing RADDACL2. He is one of the co-authors on the original RADDACL algorithm.

Author information

Authors and Affiliations

Computer and Information Science, University of Massachusetts Dartmouth, Dartmouth, MA, USA
Daniel Avila & Iren Valova

Authors

Daniel Avila
View author publications
You can also search for this author in PubMed Google Scholar
Iren Valova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Iren Valova.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Avila, D., Valova, I. RADDACL2: a recursive approach to discovering density clusters. Prog Artif Intell 4, 21–36 (2015). https://doi.org/10.1007/s13748-015-0066-9

Download citation

Received: 29 May 2015
Accepted: 13 November 2015
Published: 28 November 2015
Issue Date: December 2015
DOI: https://doi.org/10.1007/s13748-015-0066-9

RADDACL2: a recursive approach to discovering density clusters

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A New Density Clustering Method Using Mutual Nearest Neighbor

A dynamic density-based clustering method based on K-nearest neighbor

VDENCLUE: An Enhanced Variant of DENCLUE Algorithm

Abbreviations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

RADDACL2: a recursive approach to discovering density clusters

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A New Density Clustering Method Using Mutual Nearest Neighbor

A dynamic density-based clustering method based on K-nearest neighbor

VDENCLUE: An Enhanced Variant of DENCLUE Algorithm

Explore related subjects

Abbreviations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now