Abstract
The similarity measure is a crucial step in many machine learning problems. The traditional cosine similarity suffers from its inability to represent the semantic relationship of terms. This paper explores the kernel-based similarity measure by using term clustering. An affinity matrix of terms is constructed via the co-occurrence of the terms in both unsupervised and supervised ways. Normalized cut is employed to do the clustering to cut off the noisy edges. Diffusion kernel is adopted to measure the kernel-like similarity of the terms in the same cluster. Experiments demonstrate our methods can give satisfactory results, even when the training set is small.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ferrer, R., Sole, R.V.: The small world of human language. In: Proceedings of the Royal Society o f London Series B- Biological Sciences, pp. 2261–2265 (2001)
Kandola, J., Taylor, J.S., Cristianini, N., Davis: Learning Semantic Similarity. In: Proceedings of Neural Information Processing Systems (2002)
Kondor, R.I., Lafferty, J.: Diffusion kernels on graphs and other discrete structures. In: Proceedings of International Conferecne on Machine Learning (ICML 2002) (2002)
Rijsbergen, C.J.: Information Retrieval, 2nd edn. Butterworths (1979)
Salton, G., Michael, J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
Shi, J., Malik, J.: Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, C., Wang, W. (2005). Using Term Clustering and Supervised Term Affinity Construction to Boost Text Classification. In: Ho, T.B., Cheung, D., Liu, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2005. Lecture Notes in Computer Science(), vol 3518. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11430919_95
Download citation
DOI: https://doi.org/10.1007/11430919_95
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26076-9
Online ISBN: 978-3-540-31935-1
eBook Packages: Computer ScienceComputer Science (R0)