Abstract
Fuzzy co-clustering is a method that performs simultaneous fuzzy clustering of objects and features. In this paper, we introduce a new fuzzy co-clustering algorithm for high-dimensional datasets called Cosine-Distance-based & Dual-partitioning Fuzzy Co-clustering (CODIALING FCC). Unlike many existing fuzzy co-clustering algorithms, CODIALING FCC is a dual-partitioning algorithm. It clusters the features in the same manner as it clusters the objects, that is, by partitioning them according to their natural groupings. It is also a cosine-distance-based algorithm because it utilizes the cosine distance to capture the belongingness of objects and features in the co-clusters. Our main purpose of introducing this new algorithm is to improve the performance of some prominent existing fuzzy co-clustering algorithms in dealing with datasets with high overlaps. In our opinion, this is very crucial since most real-world datasets involve significant amount of overlaps in their inherent clustering structures. We discuss how this improvement can be made through the dual-partitioning formulation adopted. Experimental results on a toy problem and five large benchmark document datasets demonstrate the effectiveness of CODIALING FCC in handling overlaps better.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Mitra, S., Acharya, T.: Data Mining Multimedia, Soft Computing, and Bioinformatics. John Wiley & Sons Inc., New Jersey (2003)
Han, J., Kamber, M.: Data Mining Concepts and Techniques. Academic Press, London (2001)
Zamir, O., Etzioni, O.: Web Document Clustering: A Feasibility Demonstration. In: Proc. of the Twenty First Annual International ACM SIGIR Conf. on R&D in Information Retrieval, pp. 46–54 (1998)
Madeira, S.C., Oliveira, A.L.: Biclustering Algorithms for Biological Data Analysis: A Survey. IEEE/ACM Trans. on Comp. Biology and Bioinf. 1, 24–45 (2004)
Ertoz, L., Steinbach, M., Kumar, V.: Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data. In: Proc. of SIAM International Conf. on Data Mining (2003)
Dhillon, I.S., Mallela, S., Modha, D.S.: Information-Theoretic Co-clustering. In: Proc of the Ninth ACM SIGKDD International Conf. on KDD, pp. 89–98 (2003)
Banerjee, A., Dhillon, I.S., Modha, D.S.: A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation. In: Proc. of the Tenth ACM SIGKDD International Conf. on KDD, pp. 509–514 (2004)
Cho, H., Dhillon, I.S., Guan, Y., Sra, S.: Minimum Sum-squared Residues Co-clustering of Gene Expression Data. In: Proc. of the Fourth SIAM International Conf. on Data Mining (2004)
Mandhani, B., Joshi, S., Kummamuru, K.: A Matrix Density Based Algorithm to Hierarchically Co-Cluster Documents and Words. In: Proc. of the Twelfth Int. Conference on WWW, pp. 511–518 (2003)
Zadeh, L.A.: Fuzzy Sets. Information and Control 8 (1965)
Frigui, H., Nasraoui, O.: Simultaneous Clustering and Dynamic Keyword Weighting for Text Documents. In: Berry, M.W. (ed.) Survey of Text Mining, pp. 45–72. Springer, Heidelberg (2004)
Kummamuru, K., Dhawale, A., Krishnapuram, R.: Fuzzy Co-clustering of Documents and Keywords. IEEE International Conf. on Fuzzy Systems 2, 772–777 (2003)
Ruspini, E.: A new approach to clustering. Information and Control 15, 22–32 (1969)
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press (1981)
Oh, C.H., Honda, K., Ichihashi, H.: Fuzzy Clustering for Categorical Multivariate Data. In: Proc. of Joint 9th IFSA World Congress and 2nd NAFIPS Inter. Conf., pp. 2154–2159 (2001)
Sinka, M.P., Corne, D.W.: A Large Benchmark Dataset for Web Document Clustering. In: Abraham, A., et al. (eds.) Soft Computing Systems: Design, Management and Applications, pp. 881–892. IOS Press, Amsterdam (2002)
Dhillon, I.S., Fan, J., Guan, Y.: Efficient Clustering of Very Large Document Collections. In: Grossman, R.L., et al. (eds.) Data Mining for Scientific and Engineering Applications, pp. 357–382. Kluwer Academic Publishers, Dordrecht (2001)
Yates, R.B., Neto, R.R.: Modern Information Retrieval. ACM Press, New York (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tjhi, WC., Chen, L. (2006). A New Fuzzy Co-clustering Algorithm for Categorization of Datasets with Overlapping Clusters. In: Li, X., Zaïane, O.R., Li, Z. (eds) Advanced Data Mining and Applications. ADMA 2006. Lecture Notes in Computer Science(), vol 4093. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11811305_36
Download citation
DOI: https://doi.org/10.1007/11811305_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37025-3
Online ISBN: 978-3-540-37026-0
eBook Packages: Computer ScienceComputer Science (R0)