Abstract
The key problem of supervising word sense disambiguation is the lack of a large-scale and high-quality corpus of word sense tagging. Based on the Contemporary Chinese Semantic Dictionary, the Modern Chinese Dictionary (5th Edition) and the Chinese Lexical Semantic Knowledge Base, this paper analyzes the adjectives, nouns and verbs with polysemic in the dictionaries and fuses them together to construct the Zhengzhou University Contemporary Chinese Semantic Dictionary. People’s Daily corpus is selected for annotation, and the word sense tagging corpus with 1.87 million words is constructed. It is expected to provide better data support for natural language processing tasks such as semantic automatic analysis and word sense disambiguation. This paper presents a detailed and rigorous specification of word sense tagging in the process of annotation. In addition, in the new domain corpus, the automatic annotation method achieved excellent performance, which can be used for subsequent reference.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ng, H.T., Wang, B., Chan, Y.S.: Exploiting parallel texts for word sense disambiguation: an empirical study. In: Meeting on Association for Computational Linguistics, pp. 455–462 (2003)
Vronis, J.: Sense Tagging: Does It Make Sense? (2003)
Zan, H.Y., Xu, H.F., Zhang, K.L., et al.: The construction of internet slang dictionary and its analysis. J. Chin. Inf. Process. 30(6), 133–139 (2016). (in Chinese)
Jin, P., Wu, Y.F., Yu, S.W.: Survey of word sense annotated corpus construction. J. Chin. Inf. Process. 22(3), 16–23 (2008). (in Chinese)
Fellbaum, C., Miller, G.: WordNet: an electronic lexical database. Libr. Q. Inf. Community Policy 25(2), 292–296 (1998)
Miller, G.A., Leacock, C., Tengi, R., et al.: A semantic concordance. In: The Workshop on Human Language Technology, pp. 303–308 (1993)
Wang, J., Yang, L.J., Jiang, H.F., et al.: A word sense annotated corpus for teaching Chinese as second language. J. Chin. Inf. Process. 31(1), 221–229 (2017). (in Chinese)
You, F., Li, J.Z., Wang, Z.Y.: On construction of a Chinese corpus based on semantic dependency relations. J. Chin. Inf. Process. 17(1), 46–53 (2003). (in Chinese)
Wu, Y., Jin, P., Zhang, Y., Yu, S.: A Chinese corpus with word sense annotation. In: Matsumoto, Y., Sproat, Richard W., Wong, K.-F., Zhang, M. (eds.) ICCPOL 2006. LNCS (LNAI), vol. 4285, pp. 414–421. Springer, Heidelberg (2006). https://doi.org/10.1007/11940098_43
Chinese Academy of Social Sciences.: Modern Chinese Dictionary. The Commercial Press, Beijing (2012). (in Chinese)
Shi, J.M., Zan, H.Y., Han, Y.J.: Specification of the large-scale Chinese lexical semantic knowledge base building. Learn. Period. Soc. Shanxi Univ. (Nat. Sci. Ed.) 38(4), 581–587 (2015). (in Chinese)
Yu, S.W., Duan, H.M., Zhu, X.F., et al.: The basic processing of contemporary Chinese corpus at peking university specification. J. Chin. Inf. Process. 16(5), 58–65 (2002). (in Chinese)
Wang, H., Zhan, W.D., Yu, S.W.: Structure and application of the semantic knowledge-base of modern Chinese. Appl. Linguist. 1, 134–141 (2006). (in Chinese)
Xiao, H.: The sense relations and sense distinction of polysemes in the dictionary. J. Yunnan Norm. Univ. (HumIties Soc. Sci.) 42(1), 41–46 (2010). (in Chinese)
Zhang, K.L., Zan, H.Y., Chai, Y.M., et al.: Survey of the Chinese function word usage knowledge base. J. Chin. Inf. Process. 29(3), 1–8 (2015). (in Chinese)
Zhou, Z.H.: Machine Learning. Tsinghua University Press, Beijing (2016). (in Chinese)
Li, H.: Statistical Learning Method. Tsinghua University Press, Beijing (2012). (in Chinese)
Cui, M., Zhang, C.L.: The comparison study of LIBSVM, LIBLINEAR, SVMmulticlass. Electron. Technol. (6), (2015). (in Chinese)
Ng, H.T., Lim, C.Y., Shou, K.F.: A case study on inter-annotator agreement for. (2007)
Carletta, J.: Assessing agreement on classification tasks: the kappa statistic. Comput. Linguist. 22(2), 249–254 (1996)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Zan, H., Chen, J., Cheng, X., Mu, L. (2018). Construction of Word Sense Tagging Corpus. In: Hong, JF., Su, Q., Wu, JS. (eds) Chinese Lexical Semantics. CLSW 2018. Lecture Notes in Computer Science(), vol 11173. Springer, Cham. https://doi.org/10.1007/978-3-030-04015-4_59
Download citation
DOI: https://doi.org/10.1007/978-3-030-04015-4_59
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04014-7
Online ISBN: 978-3-030-04015-4
eBook Packages: Computer ScienceComputer Science (R0)