[go: up one dir, main page]

Skip to main content

Word Segmentation and POS Tagging for Chinese Keyphrase Extraction

  • Conference paper
Advanced Data Mining and Applications (ADMA 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3584))

Included in the following conference series:

Abstract

Keyphrases are essential for many text mining applications. In order to automatically extracting keyphrases from Chinese text, an extraction system is proposed in this paper. To access a particular problem of Chinese information processing, a lexicon-based word segmentation approach is presented. For this purpose, a verb lexicon, a functional word lexicon and a stop word lexicon are constructed. A predefined keyphrase lexicon is applied to improve the performance of extraction. The approach uses a small Part-Of-Speech(POS) tagset to index phrases simply according to these lexicons. It is especially effective for identifying phrases in form of combinations of nouns, adjectives and verbs. Keyphrases are sifted by their weighted TF-IDF (Term occurrence Frequency-Inverse Document Frequency) values. New keyphrases are added into the keyphrase lexicon.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Turney, P.: Learning to Extract Keyphrases from Text. National Research Council of Canada (1999)

    Google Scholar 

  2. D’Avanzo, E., Lavelli, A., Magnini, B., Zanoli, R.: Using Keyphrases as Features for Text Categorization. ITC-irst, Technical report, November 12, Ref. No.: T03-11-01 (2003)

    Google Scholar 

  3. Jones, S., Jones, M., Shaleen, D.: Using Keyphrases as Search Result Surrogates on Small Screen Devices. Personal and Ubiquitous Computing 8(1), 55–68 (2004)

    Article  Google Scholar 

  4. ICTCLAS (Institute of Computing Technology, Chinese Lexical Analysis System), http://www.nlp.org.cn/project/project.php?proj_id=6

  5. Maosong, S., Dayang, S., Tsou, B.K.: Chinese Word Segmentation without Using Lexicon and Hand-Crafted Training Data. In: Proceedings of the 17th international conference on Computational linguistics, vol. 2, pp. 1265–1271 (1998)

    Google Scholar 

  6. Shen, D., Cong, Y., Sun, J., Lu, Y.: Studies on Chinese Web Page Classification. In: Proceedings of the Second International Conference on Machine Learning and Cybernetics, Xi’an, China, vol. 1, pp. 23–27 (2003)

    Google Scholar 

  7. Yu, S., Zhu, X., Zhang, Y.: The Specification of the Synthetic Knowledge-base of Contemporary Chinese. Journal of Chinese Information Processing 10, 1–22 (1996)

    Google Scholar 

  8. Hockenmaier, J., Brew, C.: Error-driven learning of Chinese word segmentation. In: 12th Pacific Conference on Language and Information, Singapore, pp. 218–229. Chinese and Oriental Languages Processing Society (1998)

    Google Scholar 

  9. Li, X.: A Guide to Functional Words in Modern Chinese. Peking University Press, Peking (2003)

    Google Scholar 

  10. Teching Material of Modern Chinese. Peking: Peking University. http://ccl.pku.edu.cn/course/5_jiaocai.Asp?Folder=%2Fcourse%2Flecturenotes&MyOrder By=标标

  11. Wang, N., Liu, Q.: Chinese Thesaurus of Computer Science and Technology. Tsinghua University Press, Peking (1990)

    Google Scholar 

  12. Han, K.S., Wang, Y.C., Shen, Z., Wu, F.F.: Extract Subject from Chinese Text with Three Different Levels. Journal of Chinese Information Processing 15(4), 20–27 (2000)

    Google Scholar 

  13. http://www.nlp.org.cn/docs/doclist.php?cat_id=16&type=15

  14. Li, X.: Tracing the Position of Aim Point Based on 2D Modes. Computer Engineering 25(5) (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Huang, X., Chen, J., Yan, P., Luo, X. (2005). Word Segmentation and POS Tagging for Chinese Keyphrase Extraction. In: Li, X., Wang, S., Dong, Z.Y. (eds) Advanced Data Mining and Applications. ADMA 2005. Lecture Notes in Computer Science(), vol 3584. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11527503_44

Download citation

  • DOI: https://doi.org/10.1007/11527503_44

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-27894-8

  • Online ISBN: 978-3-540-31877-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics