[go: up one dir, main page]

Skip to main content

Construction of Word Sense Tagging Corpus

  • Conference paper
  • First Online:
Chinese Lexical Semantics (CLSW 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11173))

Included in the following conference series:

Abstract

The key problem of supervising word sense disambiguation is the lack of a large-scale and high-quality corpus of word sense tagging. Based on the Contemporary Chinese Semantic Dictionary, the Modern Chinese Dictionary (5th Edition) and the Chinese Lexical Semantic Knowledge Base, this paper analyzes the adjectives, nouns and verbs with polysemic in the dictionaries and fuses them together to construct the Zhengzhou University Contemporary Chinese Semantic Dictionary. People’s Daily corpus is selected for annotation, and the word sense tagging corpus with 1.87 million words is constructed. It is expected to provide better data support for natural language processing tasks such as semantic automatic analysis and word sense disambiguation. This paper presents a detailed and rigorous specification of word sense tagging in the process of annotation. In addition, in the new domain corpus, the automatic annotation method achieved excellent performance, which can be used for subsequent reference.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ng, H.T., Wang, B., Chan, Y.S.: Exploiting parallel texts for word sense disambiguation: an empirical study. In: Meeting on Association for Computational Linguistics, pp. 455–462 (2003)

    Google Scholar 

  2. Vronis, J.: Sense Tagging: Does It Make Sense? (2003)

    Google Scholar 

  3. Zan, H.Y., Xu, H.F., Zhang, K.L., et al.: The construction of internet slang dictionary and its analysis. J. Chin. Inf. Process. 30(6), 133–139 (2016). (in Chinese)

    Google Scholar 

  4. Jin, P., Wu, Y.F., Yu, S.W.: Survey of word sense annotated corpus construction. J. Chin. Inf. Process. 22(3), 16–23 (2008). (in Chinese)

    Google Scholar 

  5. Fellbaum, C., Miller, G.: WordNet: an electronic lexical database. Libr. Q. Inf. Community Policy 25(2), 292–296 (1998)

    MATH  Google Scholar 

  6. Miller, G.A., Leacock, C., Tengi, R., et al.: A semantic concordance. In: The Workshop on Human Language Technology, pp. 303–308 (1993)

    Google Scholar 

  7. Wang, J., Yang, L.J., Jiang, H.F., et al.: A word sense annotated corpus for teaching Chinese as second language. J. Chin. Inf. Process. 31(1), 221–229 (2017). (in Chinese)

    Article  Google Scholar 

  8. You, F., Li, J.Z., Wang, Z.Y.: On construction of a Chinese corpus based on semantic dependency relations. J. Chin. Inf. Process. 17(1), 46–53 (2003). (in Chinese)

    Google Scholar 

  9. Wu, Y., Jin, P., Zhang, Y., Yu, S.: A Chinese corpus with word sense annotation. In: Matsumoto, Y., Sproat, Richard W., Wong, K.-F., Zhang, M. (eds.) ICCPOL 2006. LNCS (LNAI), vol. 4285, pp. 414–421. Springer, Heidelberg (2006). https://doi.org/10.1007/11940098_43

    Chapter  Google Scholar 

  10. Chinese Academy of Social Sciences.: Modern Chinese Dictionary. The Commercial Press, Beijing (2012). (in Chinese)

    Google Scholar 

  11. Shi, J.M., Zan, H.Y., Han, Y.J.: Specification of the large-scale Chinese lexical semantic knowledge base building. Learn. Period. Soc. Shanxi Univ. (Nat. Sci. Ed.) 38(4), 581–587 (2015). (in Chinese)

    Google Scholar 

  12. Yu, S.W., Duan, H.M., Zhu, X.F., et al.: The basic processing of contemporary Chinese corpus at peking university specification. J. Chin. Inf. Process. 16(5), 58–65 (2002). (in Chinese)

    Google Scholar 

  13. Wang, H., Zhan, W.D., Yu, S.W.: Structure and application of the semantic knowledge-base of modern Chinese. Appl. Linguist. 1, 134–141 (2006). (in Chinese)

    Google Scholar 

  14. Xiao, H.: The sense relations and sense distinction of polysemes in the dictionary. J. Yunnan Norm. Univ. (HumIties Soc. Sci.) 42(1), 41–46 (2010). (in Chinese)

    Google Scholar 

  15. Zhang, K.L., Zan, H.Y., Chai, Y.M., et al.: Survey of the Chinese function word usage knowledge base. J. Chin. Inf. Process. 29(3), 1–8 (2015). (in Chinese)

    Google Scholar 

  16. Zhou, Z.H.: Machine Learning. Tsinghua University Press, Beijing (2016). (in Chinese)

    Google Scholar 

  17. Li, H.: Statistical Learning Method. Tsinghua University Press, Beijing (2012). (in Chinese)

    Google Scholar 

  18. Cui, M., Zhang, C.L.: The comparison study of LIBSVM, LIBLINEAR, SVMmulticlass. Electron. Technol. (6), (2015). (in Chinese)

    Google Scholar 

  19. Ng, H.T., Lim, C.Y., Shou, K.F.: A case study on inter-annotator agreement for. (2007)

    Google Scholar 

  20. Carletta, J.: Assessing agreement on classification tasks: the kappa statistic. Comput. Linguist. 22(2), 249–254 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongying Zan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zan, H., Chen, J., Cheng, X., Mu, L. (2018). Construction of Word Sense Tagging Corpus. In: Hong, JF., Su, Q., Wu, JS. (eds) Chinese Lexical Semantics. CLSW 2018. Lecture Notes in Computer Science(), vol 11173. Springer, Cham. https://doi.org/10.1007/978-3-030-04015-4_59

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-04015-4_59

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-04014-7

  • Online ISBN: 978-3-030-04015-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics