Abstract
Sense tagged corpora play a crucial role in Natural Language Processing, particularly in Word Sense Disambiguation and Natural Language Understanding. Since semantic annotations are usually performed by humans, such corpora are limited to a handful of tagged texts and are not available for many languages with scarce resources including Persian. The shortage of efficient, reliable linguistic resources and fundamental text processing modules for Persian have been a challenge for researchers investigating this language. We employ a newly-proposed cross-lingual sense disambiguation algorithm to automatically create large sense tagged corpora. The initial evaluation of the tagged corpus indicates promising results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Sarrafzadeh, B., Yakovets, N., Cercone, N., An, A.: Cross lingual word sense disambiguation for languages with scarce resources. In: Proc. of The 24th Canadian Conference on Artificial Intelligence (2011)
Chklovski, T., Mihalcea, R.: Building a sense tagged corpus with open mind word expert. In: Proc. of the ACL 2002 Workshop on Word Sense Disambiguation: Recent Successes and Future Directions, vol. 8 (2002)
Miller, G.A., et.al: A semantic concordance. In: Proc. of the Workshop on Human Language Technology (1993)
Koeva, S., Lesseva, S., Todorova, M.: Bulgarian sense tagged corpus. In: Proc. of the Fifth International Conference Formal Approaches to South Slavic and Balkan Languages (2006)
Mihalcea, R., Moldovan, D.I.: An automatic method for generating sense tagged corpora. In: Proc. of the 16th National Conference on Artificial Intelligence and the 11th Innovative Applications of Artificial Intelligence Conference (1999)
Gale, W.A., Church, K.W., Yarowsky, D.: One sense per discourse. In: Proc. of the Workshop on Speech and Natural Language (1992)
de Melo, G., Weikum, G.: Extracting sense-disambiguated example sentences from parallel corpora. In: Proc. of the 1st WDE (2009)
Gliozzo, A.M., Ranieri, M.: Crossing parallel corpora and multilingual lexical databases for wsd. In: Proc. of the 6th International Conference on Intelligent Text Processing and Computational Linguistics (2005)
Makki, R., Homayounpour, M.: Word sense disambiguation of farsi homographs using thesaurus and corpus. In: Proc. of the 6th International Conference on Advances in Natural Language Processing (2008)
Soltani, M., Faili, H.: A statistical approach on persian word sense disambiguation. In: The 7th International Conference on INFOS (2010)
Shamsfard, M.: Semi automatic development of farsnet; the persian wordnet. In: Proc. of 5th Global WordNet Conference (2010)
Miangah, T.: Constructing a large-scale english-persian parallel corpus. Meta: Translators’ Journal (2009)
Pilevar, T., Faili, H.: Persiansmt: A first attempt to english-persian statistical machine translation. In: JADT (2010)
Shamsfard, M.: Step-1: Standard text preparation for persian language. In: Proc. of Machine Translation Summit XII (2009)
Pedersen, T., Kolhatkar, V.: Wordnet:senserelate:allwords: a broad coverage word sense tagger that maximizes semantic relatedness. In: Proc. of Human Language Technologies: NAACL, Companion Volume: Demonstration Session (2009)
Banerjee, S.: Extended gloss overlaps as a measure of semantic relatedness. In: Proc. of the 18th International Joint Conference on Artificial Intelligence (2003)
Specia, L., et.al.: An automatic approach to create a sense tagged corpus for word sense disambiguation in machine translation. In: Proc. of the 2nd Meaning Workshop (2005)
Resnik, P., Yarowsky, D.: Distinguishing systems and distinguishing senses: new evaluation methods for word sense disambiguation. Nat. Lang. Eng. (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sarrafzadeh, B., Yakovets, N., Cercone, N., An, A. (2011). Towards Automatic Acquisition of a Fully Sense Tagged Corpus for Persian. In: Kryszkiewicz, M., Rybinski, H., Skowron, A., RaÅ›, Z.W. (eds) Foundations of Intelligent Systems. ISMIS 2011. Lecture Notes in Computer Science(), vol 6804. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21916-0_48
Download citation
DOI: https://doi.org/10.1007/978-3-642-21916-0_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21915-3
Online ISBN: 978-3-642-21916-0
eBook Packages: Computer ScienceComputer Science (R0)