Using Term Clustering and Supervised Term Affinity Construction to Boost Text Classification

Chong Wang²¹ &
Wenyuan Wang²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3518))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2620 Accesses
4 Citations

Abstract

The similarity measure is a crucial step in many machine learning problems. The traditional cosine similarity suffers from its inability to represent the semantic relationship of terms. This paper explores the kernel-based similarity measure by using term clustering. An affinity matrix of terms is constructed via the co-occurrence of the terms in both unsupervised and supervised ways. Normalized cut is employed to do the clustering to cut off the noisy edges. Diffusion kernel is adopted to measure the kernel-like similarity of the terms in the same cluster. Experiments demonstrate our methods can give satisfactory results, even when the training set is small.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Semantic Kernel for Text Classification Based on Iterative Higher–Order Relations between Words and Documents

Learning Class-Informed Semantic Similarity

Knowledge-driven graph similarity for text classification

Article Open access 19 November 2020

References

Ferrer, R., Sole, R.V.: The small world of human language. In: Proceedings of the Royal Society o f London Series B- Biological Sciences, pp. 2261–2265 (2001)
Google Scholar
Kandola, J., Taylor, J.S., Cristianini, N., Davis: Learning Semantic Similarity. In: Proceedings of Neural Information Processing Systems (2002)
Google Scholar
Kondor, R.I., Lafferty, J.: Diffusion kernels on graphs and other discrete structures. In: Proceedings of International Conferecne on Machine Learning (ICML 2002) (2002)
Google Scholar
Rijsbergen, C.J.: Information Retrieval, 2nd edn. Butterworths (1979)
Google Scholar
Salton, G., Michael, J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
MATH Google Scholar
Shi, J., Malik, J.: Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Automation, Tsinghua University, 100084, Beijing, P.R.China
Chong Wang & Wenyuan Wang

Authors

Chong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wenyuan Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Japan Advanced Institute of Science and Technology, Asahidai 1-1, 923-12292, Nomi, Japan
Tu Bao Ho
University of Hong Kong, Pokfulam Road, Hong Kong, China
David Cheung
Department of Computer Science and Engineering, Arizona State University, Tempe, Arizona, USA
Huan Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, C., Wang, W. (2005). Using Term Clustering and Supervised Term Affinity Construction to Boost Text Classification. In: Ho, T.B., Cheung, D., Liu, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2005. Lecture Notes in Computer Science(), vol 3518. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11430919_95

Download citation

DOI: https://doi.org/10.1007/11430919_95
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26076-9
Online ISBN: 978-3-540-31935-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Using Term Clustering and Supervised Term Affinity Construction to Boost Text Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Semantic Kernel for Text Classification Based on Iterative Higher–Order Relations between Words and Documents

Learning Class-Informed Semantic Similarity

Knowledge-driven graph similarity for text classification

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Using Term Clustering and Supervised Term Affinity Construction to Boost Text Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Semantic Kernel for Text Classification Based on Iterative Higher–Order Relations between Words and Documents

Learning Class-Informed Semantic Similarity

Knowledge-driven graph similarity for text classification

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation