Abstract
Keyphrases are essential for many text mining applications. In order to automatically extracting keyphrases from Chinese text, an extraction system is proposed in this paper. To access a particular problem of Chinese information processing, a lexicon-based word segmentation approach is presented. For this purpose, a verb lexicon, a functional word lexicon and a stop word lexicon are constructed. A predefined keyphrase lexicon is applied to improve the performance of extraction. The approach uses a small Part-Of-Speech(POS) tagset to index phrases simply according to these lexicons. It is especially effective for identifying phrases in form of combinations of nouns, adjectives and verbs. Keyphrases are sifted by their weighted TF-IDF (Term occurrence Frequency-Inverse Document Frequency) values. New keyphrases are added into the keyphrase lexicon.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Turney, P.: Learning to Extract Keyphrases from Text. National Research Council of Canada (1999)
D’Avanzo, E., Lavelli, A., Magnini, B., Zanoli, R.: Using Keyphrases as Features for Text Categorization. ITC-irst, Technical report, November 12, Ref. No.: T03-11-01 (2003)
Jones, S., Jones, M., Shaleen, D.: Using Keyphrases as Search Result Surrogates on Small Screen Devices. Personal and Ubiquitous Computing 8(1), 55–68 (2004)
ICTCLAS (Institute of Computing Technology, Chinese Lexical Analysis System), http://www.nlp.org.cn/project/project.php?proj_id=6
Maosong, S., Dayang, S., Tsou, B.K.: Chinese Word Segmentation without Using Lexicon and Hand-Crafted Training Data. In: Proceedings of the 17th international conference on Computational linguistics, vol. 2, pp. 1265–1271 (1998)
Shen, D., Cong, Y., Sun, J., Lu, Y.: Studies on Chinese Web Page Classification. In: Proceedings of the Second International Conference on Machine Learning and Cybernetics, Xi’an, China, vol. 1, pp. 23–27 (2003)
Yu, S., Zhu, X., Zhang, Y.: The Specification of the Synthetic Knowledge-base of Contemporary Chinese. Journal of Chinese Information Processing 10, 1–22 (1996)
Hockenmaier, J., Brew, C.: Error-driven learning of Chinese word segmentation. In: 12th Pacific Conference on Language and Information, Singapore, pp. 218–229. Chinese and Oriental Languages Processing Society (1998)
Li, X.: A Guide to Functional Words in Modern Chinese. Peking University Press, Peking (2003)
Teching Material of Modern Chinese. Peking: Peking University. http://ccl.pku.edu.cn/course/5_jiaocai.Asp?Folder=%2Fcourse%2Flecturenotes&MyOrder By=æ ‡æ ‡
Wang, N., Liu, Q.: Chinese Thesaurus of Computer Science and Technology. Tsinghua University Press, Peking (1990)
Han, K.S., Wang, Y.C., Shen, Z., Wu, F.F.: Extract Subject from Chinese Text with Three Different Levels. Journal of Chinese Information Processing 15(4), 20–27 (2000)
Li, X.: Tracing the Position of Aim Point Based on 2D Modes. Computer Engineering 25(5) (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Huang, X., Chen, J., Yan, P., Luo, X. (2005). Word Segmentation and POS Tagging for Chinese Keyphrase Extraction. In: Li, X., Wang, S., Dong, Z.Y. (eds) Advanced Data Mining and Applications. ADMA 2005. Lecture Notes in Computer Science(), vol 3584. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11527503_44
Download citation
DOI: https://doi.org/10.1007/11527503_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27894-8
Online ISBN: 978-3-540-31877-4
eBook Packages: Computer ScienceComputer Science (R0)