Abstract
In this paper we present a Pareto based multi objective algorithm for semi supervised clustering (PSC). Semi-supervised clustering uses a small amount of supervised data known as constraints, to assist unsupervised learning. Instead of modifying the clustering objective function, we add another objective function to satisfy specified constraints. We use a lexicographically ordered cluster assignment step to direct the search and a Pareto based multi objective evolutionary algorithm to maintain diversity in the population. Two objectives are considered: one that minimizes the intra cluster variance and another that minimizes the number of constraint violations. Experiments show the superiority of the method over a greedy algorithm (PCK-means) and a genetic algorithm (COP-HGA).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Comput. Surv., 264–323 (1999)
Basu, S., Banerjee, A., Mooney, R.J.: Active Semi-Supervision for Pairwise Constrained Clustering. In: SDM (2004)
Basu, S., Bilenko, M., Mooney, R.J.: A probabilistic framework for semi-supervised clustering. In: KDD, pp. 59–68 (2004)
Basu, S., Davidson, I., Wagstaff, K.L.: Constrained Clustering: Advances in Algorithms, Theory, and Applications, 1st edn. Chapman and Hall/CRC (2008)
Aliguliyev, R.M.: Clustering of document collection - A weighting approach. Expert Syst. Appl, 7904–7916 (2009)
Maulik, U., Bandyopadhyay, S.: Genetic algorithm-based clustering technique. Pattern Recognition, 1455–1465 (2000)
Das, S., Abraham, A., Konar, A.: Metaheuristic Clustering. SCI, vol. 178. Springer, Heidelberg (2009)
Cui, X., Palathingal, P., Potok, P.: Document Clustering using Particle Swarm Optimization. In: IEEE Swarm Intelligence Symposium 2005, Pasadena, California, pp. 185–191 (2005)
Handl, J., Meyer, B.: Ant-based and swarm-based clustering. Swarm Intelligence, 95–113 (2007)
Das, S., Konar, A.: Automatic image pixel clustering with an improved differential evolution. Appl. Soft Comput., 226–236 (2009)
Song, W., Choi, L.C., Park, S.C., Ding, X.F.: Fuzzy evolutionary optimization modeling and its applications to unsupervised categorization and extractive summarization. Expert Syst. Appl., 9112–9121 (2011)
Hong, Y., Kwong, S., Xiong, H., Ren, Q.: Genetic-guided semi-supervised clustering algorithm with instance-level constraints. In: GECCO, pp. 1381–1388 (2008)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, 1st edn. Cambridge University Press (2008)
Davies, D., Bouldin, D.: A cluster separationmeasure. IEEE Trans. Pattern Anal. Mach. Intell. 1, 224–227 (1979)
Coelho, G.P., França, F.O.D., Zuben, F.J.V.: Multi-Objective Biclustering: When Non-dominated Solutions are not Enough. J. Math. Model. Algorithms, 175–202 (2009)
Maitra, M., Chatterjee, A.: A hybrid cooperative-comprehensive learning based PSO algorithm for image segmentation using multilevel thresholding. Expert Syst. Appl., 1341–1350 (2008)
Mitra, S., Banka, H.: Multi-objective evolutionary biclustering of gene expression data. Pattern Recognition, 2464–2477 (2006)
Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S.: Constrained K-means Clustering with Background Knowledge. In: ICML, pp. 577–584 (2001)
Bilenko, M., Basu, S., Mooney, R.J.: Integrating constraints and metric learning in semi-supervised clustering. In: ICML (2004)
Hong, Y., Kwong, S., Wang, H., Ren, Q., Chang, Y.: Probabilistic and Graphical Model based Genetic Algorithm Driven Clustering with Instance-level Constraints. In: IEEE Congress on Evolutionary Computation, pp. 322–329 (2008)
Deb, K., Agrawal, S., Pratap, A., Meyarivan, T.: A Fast Elitist Non-dominated Sorting Genetic Algorithm for Multi-objective Optimisation: NSGA-II. In: Deb, K., Rudolph, G., Lutton, E., Merelo, J.J., Schoenauer, M., Schwefel, H.-P., Yao, X. (eds.) PPSN 2000. LNCS, vol. 1917, pp. 849–858. Springer, Heidelberg (2000)
Coello, C.A.C.: A Comprehensive Survey of Evolutionary-Based Multiobjective Optimization Techniques. Knowl. Inf. Syst., 129–156 (1999)
Sindhya, K., Deb, K., Miettinen, K.: A Local Search Based Evolutionary Multi-objective Optimization Approach for Fast and Accurate Convergence. In: Rudolph, G., Jansen, T., Lucas, S., Poloni, C., Beume, N. (eds.) PPSN 2008. LNCS, vol. 5199, pp. 815–824. Springer, Heidelberg (2008)
Mahfoud, S.: Niching Methods for Genetic Algorithms. PhD thesis, University of Illinois at Urbana Champaign (1995)
Hruschka, E.R., Campello, R.J.G.B., Freitas, A.A., Carvalho, A.C.P.L.F.D.: A Survey of Evolutionary Algorithms for Clustering. IEEE Transactions on Systems, Man, and Cybernetics, Part C, 133–155 (2009)
Pizzuti, C.: GA-Net: A Genetic Algorithm for Community Detection in Social Networks. In: Rudolph, G., Jansen, T., Lucas, S., Poloni, C., Beume, N. (eds.) PPSN 2008. LNCS, vol. 5199, pp. 1081–1090. Springer, Heidelberg (2008)
Firat, A., Chatterjee, S., Yilmaz, M.: Genetic clustering of social networks using random walks. Computational Statistics & Data Analysis, 6285–6294 (2007)
Mitchell, T.M.: Machine learning. McGraw Hill series in computer science, pp. 1–414 (1997)
Shi, J., Malik, J.: Normalized Cuts and Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 888–905 (2000)
Ng, A.Y., Jordan, M.I., Weiss, Y.: On Spectral Clustering: Analysis and an algorithm. In: NIPS, pp. 849–856 (2001)
UCI Machine Learning Repository, http://www.ics.uci.edu/~mlearn/MLRepository.html
Repository of information on semi-supervised clustering, University of Texas at Austin, http://www.cs.utexas.edu/users/ml/risc/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ebrahimi, J., Saniee Abadeh, M. (2012). Semi Supervised Clustering: A Pareto Approach. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2012. Lecture Notes in Computer Science(), vol 7376. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31537-4_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-31537-4_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31536-7
Online ISBN: 978-3-642-31537-4
eBook Packages: Computer ScienceComputer Science (R0)