An advancement in clustering via nonparametric density estimation

Giovanna Menardi¹ &
Adelchi Azzalini¹

1329 Accesses
39 Citations
Explore all metrics

Abstract

Density-based clustering methods hinge on the idea of associating groups to the connected components of the level sets of the density underlying the data, to be estimated by a nonparametric method. These methods claim some desirable properties and generally good performance, but they involve a non-trivial computational effort, required for the identification of the connected regions. In a previous work, the use of spatial tessellation such as the Delaunay triangulation has been proposed, because it suitably generalizes the univariate procedure for detecting the connected components. However, its computational complexity grows exponentially with the dimensionality of data, thus making the triangulation unfeasible for high dimensions. Our aim is to overcome the limitations of Delaunay triangulation. We discuss the use of an alternative procedure for identifying the connected regions associated to the level sets of the density. By measuring the extent of possible valleys of the density along the segment connecting pairs of observations, the proposed procedure shifts the formulation from a space with arbitrary dimension to a univariate one, thus leading benefits both in computation and visualization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multidimensional Connected Set Detection in Clustering Based on Nonparametric Density Estimation

A Semi-parametric Density Estimation with Application in Clustering

Article 14 December 2022

Density-Based Clustering Based on Hierarchical Density Estimates

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

http://www.ncdc.noaa.gov/oa/climate/climateproducts.html#PUBS.

References

Abramson, I.S.: On bandwidth variation in kernel estimates—a square root law. Ann. Stat. 10, 1217–1223 (1982)
Article MATH MathSciNet Google Scholar
Azzalini, A., Torelli, N.: Clustering via nonparametric density estimation. Stat. Comput. 17, 71–80 (2007)
Article MathSciNet Google Scholar
Azzalini, A., Menardi, G., Rosolin, T.: R package pdfCluster: cluster analysis via nonparametric density estimation (version 1.0-0) (2012). http://cran.r-project.org/package=pdfCluster
Burman, P., Polonik, W.: Multivariate mode hunting: data analytic tools with measures of significance. J. Multivar. Anal. 100, 1198–1218 (2009)
Article MATH MathSciNet Google Scholar
Cuevas, A., Febrero, M., Fraiman, R.: Estimating the number of clusters. Can. J. Stat. 28, 367–382 (2000)
Article MATH MathSciNet Google Scholar
Cuevas, A., Febrero, M., Fraiman, R.: Cluster analysis: a further approach based on density estimation. Comput. Stat. Data Anal. 36, 441–459 (2001)
Article MATH MathSciNet Google Scholar
Dazard, J.E., Rao, J.S.: Local sparse bump hunting. J. Comput. Graph. Stat. 19, 900–929 (2010)
Article Google Scholar
De la Cruz, R.: Bayesian non-linear regression models with skew-elliptical errors: applications to the classification of longitudinal profiles. Comput. Stat. Data Anal. 53, 436–449 (2008)
Article MATH Google Scholar
Du, Q., Faber, V., Gunzburger, M.: Centroidal Voronoi tessellations: applications and algorithms. SIAM Rev. 41, 637–676 (1999)
Article MATH MathSciNet Google Scholar
Even, S.: Graph Algorithms, 2nd edn. Cambridge University Press, New York (2011)
Book Google Scholar
Forina, M., Armanino, C., Lanteri, S., Tiscornia, E.: Classication of olive oils from their fatty acid composition. In: Food Research and Data Analysis, pp. 189–214. Applied Sc. Publishers, London (1983)
Google Scholar
Forina, M., Armanino, C., Castino, M., Ubigli, M.: Multivariate data analysis as a discriminating method of the origin of wines. Vitis 25, 189–201 (1986)
Google Scholar
Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis and density estimation. J. Am. Stat. Assoc. 97, 611–631 (2002)
Article MATH MathSciNet Google Scholar
Fraley, C., Raftery, A.E.: MCLUST vers. 3 for R: normal mixture modeling and model-based clustering. Tech. Rep. 504, Univ. Washington, Dep. Stat. (2006), rev. 2009
Friedman, J.H.: On bias, variance, 0–1 loss, and the curse of dimensionality. Data Min. Knowl. Discov. 1, 55–77 (1997)
Article Google Scholar
Gower, J.C., Ross, G.J.S.: Minimum spanning trees and single linkage cluster analysis. J. R. Stat. Soc., Ser. C, Appl. Stat. 18, 54–64 (1969)
MathSciNet Google Scholar
Hartigan, J.A.: Clustering Algorithms. Wiley, New York (1975)
MATH Google Scholar
Kriegel, H., Kröger, P., Sander, J., Zimek, A.: Density-based clustering. Data Min. Knowl. Discov. 1, 231–240 (2011)
Article Google Scholar
Krznaric, D., Levcopoulos, C.: Fast algorithms for complete linkage clustering. Discrete Comput. Geom. 19, 131–145 (1998)
Article MATH MathSciNet Google Scholar
Lubischew, A.A.: On the use of discriminant analysis in taxonomy. Biometrics 18, 455–477 (1962)
Article MATH Google Scholar
Menardi, G., Torelli, N.: Reducing data dimension for cluster detection. J. Stat. Comput. Simul. (2012). doi:10.1080/00949655.2012.679032
Google Scholar
Minnotte, M.C.: Nonparametric testing for the existence of modes. Ann. Stat. 25, 1646–1660 (1997)
Article MATH MathSciNet Google Scholar
Müller, D.W., Sawitzki, G.: Excess mass estimates and tests for multimodality. J. Am. Stat. Assoc. 86, 738–746 (1991)
MATH Google Scholar
Prates, M., Lachos, V., Cabral, C.: R package mixsmsn: fitting finite mixture of scale mixture of skew-normal distributions (version 1.0-3) (2012). http://cran.r-project.org/web/packages/mixsmsn/index.html
Ooi, H.: Density visualization and mode hunting using trees. J. Comput. Graph. Stat. 11, 328–347 (2002)
Article MathSciNet Google Scholar
R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2013). http://www.R-project.org/. ISBN 3-900051-07-3-900051-0
Rinaldo, A., Wasserman, L.: Generalized density clustering. Ann. Stat. 38, 2678–2722 (2010)
Article MATH MathSciNet Google Scholar
Sahu, S.K., Dey, D.K., Branco, M.D.: A new class of multivariate skew distributions with applications to Bayesian regression models. Can. J. Stat. 31, 129–150 (2003)
Article MATH MathSciNet Google Scholar
Scott, D.W., Sain, S.: Multidimensional Density Estimation. Handbook of Statistics, vol. 24, pp. 229–261 (2005)
Google Scholar
Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, New York (1986)
Book MATH Google Scholar
Stuetzle, W.: Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample. J. Classif. 20, 25–47 (2003)
Article MATH MathSciNet Google Scholar
Stuetzle, W., Nugent, R.: A generalized single linkage method for estimating the cluster tree of a density. J. Comput. Graph. Stat. 19, 397–418 (2010)
Article MathSciNet Google Scholar
Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a dataset via the GAP statistic. J. R. Stat. Soc., Ser. B, Stat. Methodol. 63, 411–423 (2000)
Article MathSciNet Google Scholar
Wishart, D.: Mode analysis: a generalization of nearest neighbor which reduces chaining effects. In: Cole, A.J. (ed.) Numerical Taxonomy, pp. 282–308. Academic Press, London (1969)
Google Scholar

Download references

Acknowledgements

We wish to thank the two anonymous referees and an Associate editor for their valuable comments which have stimulated a clearer exposition of the original formulation and helped to improve the overall outcome.

Author information

Authors and Affiliations

Department of Statistical Sciences, University of Padua, via Cesare Battisti 241, 35121, Padova, Italy
Giovanna Menardi & Adelchi Azzalini

Authors

Giovanna Menardi
View author publications
You can also search for this author in PubMed Google Scholar
Adelchi Azzalini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giovanna Menardi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Menardi, G., Azzalini, A. An advancement in clustering via nonparametric density estimation. Stat Comput 24, 753–767 (2014). https://doi.org/10.1007/s11222-013-9400-x

Download citation

Received: 20 February 2012
Accepted: 16 April 2013
Published: 09 May 2013
Issue Date: September 2014
DOI: https://doi.org/10.1007/s11222-013-9400-x

An advancement in clustering via nonparametric density estimation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multidimensional Connected Set Detection in Clustering Based on Nonparametric Density Estimation

A Semi-parametric Density Estimation with Application in Clustering

Density-Based Clustering Based on Hierarchical Density Estimates

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

An advancement in clustering via nonparametric density estimation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multidimensional Connected Set Detection in Clustering Based on Nonparametric Density Estimation

A Semi-parametric Density Estimation with Application in Clustering

Density-Based Clustering Based on Hierarchical Density Estimates

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation