Ensemble clustering using semidefinite programming with applications

Vikas Singh¹,
Lopamudra Mukherjee²,
Jiming Peng³ &
…
Jinhui Xu⁴

1156 Accesses
52 Citations
Explore all metrics

Abstract

In this paper, we study the ensemble clustering problem, where the input is in the form of multiple clustering solutions. The goal of ensemble clustering algorithms is to aggregate the solutions into one solution that maximizes the agreement in the input ensemble. We obtain several new results for this problem. Specifically, we show that the notion of agreement under such circumstances can be better captured using a 2D string encoding rather than a voting strategy, which is common among existing approaches. Our optimization proceeds by first constructing a non-linear objective function which is then transformed into a 0-1 Semidefinite program (SDP) using novel convexification techniques. This model can be subsequently relaxed to a polynomial time solvable SDP. In addition to the theoretical contributions, our experimental results on standard machine learning and synthetic datasets show that this approach leads to improvements not only in terms of the proposed agreement measure but also the existing agreement measures based on voting strategies. In addition, we identify several new application scenarios for this problem. These include combining multiple image segmentations and generating tissue maps from multiple-channel Diffusion Tensor brain images to identify the underlying structure of the brain.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Ailon, N., Charikar, M., & Newman, A. (2005). Aggregating inconsistent information: ranking and clustering. In Proc. of symposium on theory of computing (pp. 684–693).
Anjos, M., & Wolkowicz, H. (1999). A strengthened sdp relaxation via a second lifting for the max-cut problem (Technical Report).
Bansal, N., Blum, A., & Chawla, S. (2002). Correlation clustering. In Proc. symposium on foundations of computer science (p. 238).
Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11), 1222–1239. http://www.csd.uwo.ca/~olga/code.html.
Article Google Scholar
Charikar, M., Guruswami, V., & Wirth, A. (2005). Clustering with qualitative information. Journal of Computer and System Science, 71(3), 360–383.
Article MATH MathSciNet Google Scholar
Curve Evolution Segmentation Dataset (2009). http://barissumengen.com/seg/bsds_segmentations/html_for_visualization/curveevolution_testset.html.
de Silva, C. J., Masek, M., & Attikiouzel, Y. (2000). Combining data from different algorithms to segment the skin-airinterface in mammograms. In IEEE engineering in medicine and biology society (Vol. 2, pp. 1195–1198).
Dudoit, S., & Fridlyand, J. (2003). Bagging to improve the accuracy of a clustering procedure. Bioinformatics, 19(9).
Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2), 167–181.
Article Google Scholar
Fern, X. Z., & Brodley, C. E. (2003). Random projection for high dimensional data clustering: a cluster ensemble approach. In Proc. of ieee international conference on machine learning (pp. 186–193).
Fern, X. Z., & Brodley, C. E. (2004). Solving cluster ensemble problems by bipartite graph partitioning. In Proc. of international conference on machine learning (p. 36).
Filkov, V., & Skiena, S. (2003). Integrating microarray data by consensus clustering. In Proc. of international conference on tools with artificial intelligence (p. 418).
Fred, A. L. N., & Jain, A. K. (2006). Combining multiple clusterings using evidence accumulation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6).
Gasieniec, L., Jansson, J., & Lingas, A. (2000). Approximation algorithms for hamming clustering problems. In Proc. of symposium on combinatorial pattern matching (pp. 108–118).
Gionis, A., Mannila, H., & Tsaparas, P. (2005). Clustering aggregation. In Proc. of international conference on data engineering (pp. 341–352).
Giotis, I., & Guruswami, V. (2006). Correlation clustering with a fixed number of clusters. In Proc. of symposium on discrete algorithms (pp. 1167–1176).
Goel, N., Bebis, G., & Nefian, A. (2005). Face recognition experiments with random projection. In Proc. SPIE (Vol. 5779, pp. 426–437).
Goemans, M. X., & Williamson, D. P. (1995). Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM, 42, 1115–1145.
Article MATH MathSciNet Google Scholar
Gordon, A. D., & Henderson, J. T. (1977). An algorithm for euclidean sum of squares classification. Biometrics, 33, 355–362.
Article MATH Google Scholar
Johnson, W. B., & Lindenstrauss, J. (1984). Extensions of lipshitz mapping into hilbert space. Contemporary Mathematics, 26, 189–206.
MATH MathSciNet Google Scholar
Kuncheva, L. I., & Vetrov, D. P. (2006). Evaluation of stability of k-means cluster ensembles with respect to random initialization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(11).
Lasserre, J. B. (2001). An explicit exact sdp relaxation for nonlinear 0-1 programs. In Proceedings of the 8th international IPCO conference on integer programming and combinatorial optimization (pp. 293–303).
Liu, C., & Wechsler, H. (2003). Independent component analysis of gabor features for face recognition. IEEE Transactions on Neural Networks, 14(4), 919–928.
Article Google Scholar
Liu, T., Li, H., Wong, K., Tarokh, A., Guo, L., & Wong, S. T. C. (2007). Brain tissue segmentation based on DTI data. NeuroImage, 38(1), 114–123.
Article Google Scholar
Löfberg, J. (2004). YALMIP: A toolbox for modeling and optimization in MATLAB. In CCA/ISIC/CACSD, September 2004.
Monti, S., Tamayo, P., Mesirov, J., & Golub, T. (2003). Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learning, 52(1–2), 91–118.
Article MATH Google Scholar
Ng, A. Y., Jordan, M. I., & Weiss, Y. (2001). On spectral clustering: analysis and an algorithm. In Advances in neural information processing systems (pp. 849–856).
Peng, J., & Wei, Y. (2007). Approximating k-means-type clustering via semidefinite programming. SIAM Journal on Optimization, 18(1), 186–205.
Article MATH MathSciNet Google Scholar
Peng, J., Mukherjee, L., Singh, V., Schuurmans, D., & Xu, L. (2009). An efficient algorithm for maximal margin clustering. Manuscript (under peer review).
Perlibakas, V. (2004). Distance measures for PCA-based face recognition. Pattern Recognition Letters, 25(6), 711–724.
Article Google Scholar
Rohlfing, T., & Maurer, C. R. Jr. (2005). Shape-based averaging for combination of multiple segmentations. In Medical image computing and computer-assisted intervention (MICCAI) (pp. 838–845).
Rohlfing, T., & Maurer, C. R. Jr. (2005). Multi-classifier framework for atlas-based image segmentation. Pattern Recognition Letters, 26(13), 2070–2079.
Article Google Scholar
Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.
Article Google Scholar
Singh, V., Mukherjee, L., Peng, J., & Xu, J. (2007). Ensemble clustering using semidefinite programming. In Advances in neural information processing systems.
Strehl, A., & Ghosh, J. (2003). Cluster ensembles—a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3, 583–617.
Article MATH MathSciNet Google Scholar
Sturm, J. F. (1999). Using SeDuMi 1.02, A Matlab Toolbox for Optimization over Symmetric Cones. Optimization Methods and Software, 11–12, 625–653.
Article MathSciNet Google Scholar
The Berkeley Segmentation Database and Benchmark. (2009). Computer Science Department, UC-Berkeley.
Toh, K. C., Todd, M. J., & Tutuncu, R. (1999). SDPT3—a Matlab software package for semidefinite programming. Optimization Methods and Software, 11(12), 545–581.
Article MathSciNet Google Scholar
Topchy, A., Jain, A. K., & Punch, W. (2003). A mixture model for clustering ensembles. In Proc. of SIAM conference on data mining.
Topchy, A., Jain, A. K., & Punch, W. (2003). Combining multiple weak clusterings. In Proc. of IEEE international conference on data mining.
Turk, M., & Pentland, A. (1991). Eigenfaces for Recognition. Journal of Cognitive Neuroscience, 3(1), 71–86.
Article Google Scholar
Vandenberghe, L., & Boyd, S. (1996). Semidefinite programming. SIAM Review, 38, 49–95.
Article MATH MathSciNet Google Scholar
Vempala, S. S. (2004). The random projection method. DIMACS series in discrete mathematics and theoretical computer science (Vol. 65). Providence: Am. Math. Soc.
MATH Google Scholar
Warfield, S. K., Zou, K. H., & Wells, W. M. (2004). Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation. IEEE Transactions on Medical Imaging, 23(7), 903–921.
Article Google Scholar
Xing, E. P., & Jordan, M. I. (2003). On semidefinite relaxation for normalized k-cut and connections to spectral clustering (Technical Report UCB/CSD-03-1265). EECS Department, University of California Berkeley.
Xu, L., Neufeld, J., Larson, B., & Schuurmans, D. (2005). Maximum margin clustering. In Advances in neural information processing systems Cambridge: MIT Press.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Biostatistics & Medical Informatics, University of Wisconsin–Madison, Madison, WI, USA
Vikas Singh
Department of Mathematics and Computer Science, University of Wisconsin–Whitewater, Whitewater, WI, USA
Lopamudra Mukherjee
Industrial and Enterprise Systems Engineering, University of Illinois at Urbana-Champaign, Urbana-Champaign, IL, USA
Jiming Peng
Department of Computer Sci. and Eng., The State University of New York at Buffalo, Buffalo, NY, USA
Jinhui Xu

Authors

Vikas Singh
View author publications
You can also search for this author in PubMed Google Scholar
Lopamudra Mukherjee
View author publications
You can also search for this author in PubMed Google Scholar
Jiming Peng
View author publications
You can also search for this author in PubMed Google Scholar
Jinhui Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lopamudra Mukherjee.

Additional information

Editors: Nicolo Cesa-Bianchi, David R. Hardoon, and Gayle Leen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Singh, V., Mukherjee, L., Peng, J. et al. Ensemble clustering using semidefinite programming with applications. Mach Learn 79, 177–200 (2010). https://doi.org/10.1007/s10994-009-5158-y

Download citation

Received: 26 February 2009
Revised: 09 October 2009
Accepted: 12 October 2009
Published: 22 December 2009
Issue Date: May 2010
DOI: https://doi.org/10.1007/s10994-009-5158-y

Ensemble clustering using semidefinite programming with applications

Abstract

Article PDF

Similar content being viewed by others

Ensemble Registration: Incorporating Structural Information into Groupwise Registration

$$SC^2$$ : A Selection-Based Consensus Clustering Approach

Clustering ensemble method

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Navigation

Ensemble clustering using semidefinite programming with applications

Abstract

Article PDF

Similar content being viewed by others

Ensemble Registration: Incorporating Structural Information into Groupwise Registration

$$SC^2$$ : A Selection-Based Consensus Clustering Approach

Clustering ensemble method

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation