Abstract
We introduce the concept of snipping, complementing that of trimming, in robust cluster analysis. An observation is snipped when some of its dimensions are discarded, but the remaining are used for clustering and estimation. Snipped k-means is performed through a probabilistic optimization algorithm which is guaranteed to converge to the global optimum. We show global robustness properties of our snipped k-means procedure. Simulations and a real data application to optical recognition of handwritten digits are used to illustrate and compare the approach.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Alqallaf, F., Van Aelst, S., Yohai, V.J., Zamar, R.H.: Propagation of outliers in multivariate data. Ann. Stat. 37, 311–331 (2009)
Banfield, J., Raftery, A.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)
Chakraborty, B., Chaudhury, P.: On an optimization problem in robust statistics. J. Comput. Graph. Stat. 17, 683–702 (2008)
Cuesta-Albertos, J., Gordaliza, A., Matrán, C.: Trimmed k-means: an attempt to robustify quantizers. Ann. Stat. 25, 553–576 (1997)
Donoho, D., Huber, P.: The notion of breakdown point. In: Bickel, P., Doksum, K., Hodges, J. (eds.) A Festschirift for Erich L. Lehmann, Wadsworth, Belmont, CA, pp. 157–184 (1983)
Farcomeni, A.: Robust double clustering: a method based on alternating concentration steps. J. Classif. 26, 77–101 (2009)
Farcomeni, A.: Robust constrained clustering in presence of entry-wise outliers. Technometrics (2013, to appear)
Farcomeni, A., Ventura, L.: An overview of robust methods in medical research. Stat. Methods Med. Res. 21, 111–133 (2012)
Forero, P.A., Kekatos, V., Giannakis, G.B.: Robust clustering using outlier-sparsity regularization. IEEE Trans. Signal Process. 60, 4163–4177 (2012)
Frank, A., Asuncion, A.: UCI machine learning repository (2010). http://archive.ics.uci.edu/ml
Frühwirth-Schnatter, S., Pyne, S.: Bayesian inference for finite mixtures of univariate skew-normal and skew-t distributions. Biostatistics 11, 317–336 (2010)
Gallegos, M., Ritter, G.: A robust method for cluster analysis. Ann. Stat. 33, 347–380 (2005)
Gallegos, M., Ritter, G.: Trimmed ML estimation of contaminated mixtures. Sankhya 71, 164–220 (2009a)
Gallegos, M., Ritter, G.: Trimming algorithms for clustering contaminated grouped data and their robustness. Adv. Data Anal. Classif. 3, 135–167 (2009b)
Gallegos, M., Ritter, G.: Using combinatorial optimization in model-based trimmed clustering with cardinality constraints. Comput. Stat. Data Anal. 54, 637–654 (2010)
García-Escudero, L., Gordaliza, A.: Robustness properties of k means and trimmed k means. J. Am. Stat. Assoc. 94, 956–969 (1999)
García-Escudero, L.A., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: A general trimming approach to robust cluster analysis. Ann. Stat. 36, 1324–1345 (2008)
García-Escudero, L.A., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: A review of robust clustering methods. Adv. Data Anal. Classif. 4, 89–109 (2010)
García-Escudero, L.A., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: Exploring the number of groups in robust model-based clustering. Stat. Comput. 21, 585–599 (2011)
Gordaliza, A.: Best approximations to random variables based on trimming procedures. J. Approx. Theory 64, 162–180 (1991)
Hampel, F.: A general qualitative definition of robustness. Ann. Math. Stat. 42, 1887–1896 (1971)
Hampel, F., Rousseeuw, P., Ronchetti, E., Stahel, W.: Robust Statistics: the Approach Based on the Influence Function. Wiley, New York (1986)
Hennig, C.: Dissolution point and isolation robustness: robustness criteria for general cluster analysis methods. J. Multivar. Anal. 99, 11541176 (2008)
Heritier, S., Cantoni, E., Copt, S., Victoria-Feser, M.P.: Robust Methods in Biostatistics. Wiley, Chichester (2009)
Hodges, J.: Efficiency in normal samples and tolerance of extreme values for some estimates of location. In: Proc. Fifth Berkeley Symp. Math. Statist. Probab., vol. 1, pp. 163–186. University of California Press, Berkeley (1967)
Huber, P., Ronchetti, E.: Robust Statistics. Wiley, New York (2009)
Huber, P.J.: Robust estimation of a location parameter. Ann. Math. Stat. 35, 73–101 (1964)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)
Hubert, M., Rousseeuw, P., Van Aelst, S.: High-breakdown robust multivariate methods. Stat. Sci. 23, 92–119 (2008)
Kaufman, L., Rousseeuw, P.: Finding Groups in Data. Wiley, New York (1990)
Rousseeuw, P., Van Driessen, K.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212–223 (1999)
Ruwet, C., Garcia-Escudero, L., Gordaliza, A., Mayo-Iscar, A.: On the breakdown behavior of robust constrained clustering procedures. TEST (2012, to appear)
Tukey, J.W.: The future of data analysis. Ann. Math. Stat. 33, 167 (1962)
Acknowledgements
The author is grateful to an AE and two anonymous referees for very kind suggestions.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Farcomeni, A. Snipping for robust k-means clustering under component-wise contamination. Stat Comput 24, 907–919 (2014). https://doi.org/10.1007/s11222-013-9410-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-013-9410-8