Towards Automatic Acquisition of a Fully Sense Tagged Corpus for Persian

Bahareh Sarrafzadeh²³,
Nikolay Yakovets²³,
Nick Cercone²³ &
…
Aijun An²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6804))

Included in the following conference series:

International Symposium on Methodologies for Intelligent Systems

3781 Accesses

Abstract

Sense tagged corpora play a crucial role in Natural Language Processing, particularly in Word Sense Disambiguation and Natural Language Understanding. Since semantic annotations are usually performed by humans, such corpora are limited to a handful of tagged texts and are not available for many languages with scarce resources including Persian. The shortage of efficient, reliable linguistic resources and fundamental text processing modules for Persian have been a challenge for researchers investigating this language. We employ a newly-proposed cross-lingual sense disambiguation algorithm to automatically create large sense tagged corpora. The initial evaluation of the tagged corpus indicates promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A word sense disambiguation corpus for Urdu

Article 24 November 2018

Construction of Word Sense Tagging Corpus

Enrichment of OntoSenseNet: Adding a Sense-annotated Telugu Lexicon

References

Sarrafzadeh, B., Yakovets, N., Cercone, N., An, A.: Cross lingual word sense disambiguation for languages with scarce resources. In: Proc. of The 24th Canadian Conference on Artificial Intelligence (2011)
Google Scholar
Chklovski, T., Mihalcea, R.: Building a sense tagged corpus with open mind word expert. In: Proc. of the ACL 2002 Workshop on Word Sense Disambiguation: Recent Successes and Future Directions, vol. 8 (2002)
Google Scholar
Miller, G.A., et.al: A semantic concordance. In: Proc. of the Workshop on Human Language Technology (1993)
Google Scholar
Koeva, S., Lesseva, S., Todorova, M.: Bulgarian sense tagged corpus. In: Proc. of the Fifth International Conference Formal Approaches to South Slavic and Balkan Languages (2006)
Google Scholar
Mihalcea, R., Moldovan, D.I.: An automatic method for generating sense tagged corpora. In: Proc. of the 16th National Conference on Artificial Intelligence and the 11th Innovative Applications of Artificial Intelligence Conference (1999)
Google Scholar
Gale, W.A., Church, K.W., Yarowsky, D.: One sense per discourse. In: Proc. of the Workshop on Speech and Natural Language (1992)
Google Scholar
de Melo, G., Weikum, G.: Extracting sense-disambiguated example sentences from parallel corpora. In: Proc. of the 1st WDE (2009)
Google Scholar
Gliozzo, A.M., Ranieri, M.: Crossing parallel corpora and multilingual lexical databases for wsd. In: Proc. of the 6th International Conference on Intelligent Text Processing and Computational Linguistics (2005)
Google Scholar
Makki, R., Homayounpour, M.: Word sense disambiguation of farsi homographs using thesaurus and corpus. In: Proc. of the 6th International Conference on Advances in Natural Language Processing (2008)
Google Scholar
Soltani, M., Faili, H.: A statistical approach on persian word sense disambiguation. In: The 7th International Conference on INFOS (2010)
Google Scholar
Shamsfard, M.: Semi automatic development of farsnet; the persian wordnet. In: Proc. of 5th Global WordNet Conference (2010)
Google Scholar
Miangah, T.: Constructing a large-scale english-persian parallel corpus. Meta: Translators’ Journal (2009)
Google Scholar
Pilevar, T., Faili, H.: Persiansmt: A first attempt to english-persian statistical machine translation. In: JADT (2010)
Google Scholar
Shamsfard, M.: Step-1: Standard text preparation for persian language. In: Proc. of Machine Translation Summit XII (2009)
Google Scholar
Pedersen, T., Kolhatkar, V.: Wordnet:senserelate:allwords: a broad coverage word sense tagger that maximizes semantic relatedness. In: Proc. of Human Language Technologies: NAACL, Companion Volume: Demonstration Session (2009)
Google Scholar
Banerjee, S.: Extended gloss overlaps as a measure of semantic relatedness. In: Proc. of the 18th International Joint Conference on Artificial Intelligence (2003)
Google Scholar
Specia, L., et.al.: An automatic approach to create a sense tagged corpus for word sense disambiguation in machine translation. In: Proc. of the 2nd Meaning Workshop (2005)
Google Scholar
Resnik, P., Yarowsky, D.: Distinguishing systems and distinguishing senses: new evaluation methods for word sense disambiguation. Nat. Lang. Eng. (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, York University, Canada
Bahareh Sarrafzadeh, Nikolay Yakovets, Nick Cercone & Aijun An

Authors

Bahareh Sarrafzadeh
View author publications
You can also search for this author in PubMed Google Scholar
Nikolay Yakovets
View author publications
You can also search for this author in PubMed Google Scholar
Nick Cercone
View author publications
You can also search for this author in PubMed Google Scholar
Aijun An
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Electronics and Information Technology, Institute of Computer Science, Warsaw University of Technology,, Nowowiejska 15/19, 00-665, Warsaw, Poland
Marzena Kryszkiewicz
Faculty of Electronics and Information Technology, Institute of Computer Science, Warsaw University of Technology, Nowowiejska 15/19, 00-665, Warsaw, Poland
Henryk Rybinski
University of Warsaw, 02-097, Warsaw, Poland
Andrzej Skowron
Faculty of Electronics and Information Technology, Institute of Computer Science, Warsaw University of Technology, Nowowiejska 15/19,, 00-665, Warsaw, Poland
Zbigniew W. Raś

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sarrafzadeh, B., Yakovets, N., Cercone, N., An, A. (2011). Towards Automatic Acquisition of a Fully Sense Tagged Corpus for Persian. In: Kryszkiewicz, M., Rybinski, H., Skowron, A., Raś, Z.W. (eds) Foundations of Intelligent Systems. ISMIS 2011. Lecture Notes in Computer Science(), vol 6804. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21916-0_48

Download citation

DOI: https://doi.org/10.1007/978-3-642-21916-0_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21915-3
Online ISBN: 978-3-642-21916-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Towards Automatic Acquisition of a Fully Sense Tagged Corpus for Persian

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A word sense disambiguation corpus for Urdu

Construction of Word Sense Tagging Corpus

Enrichment of OntoSenseNet: Adding a Sense-annotated Telugu Lexicon

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Towards Automatic Acquisition of a Fully Sense Tagged Corpus for Persian

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A word sense disambiguation corpus for Urdu

Construction of Word Sense Tagging Corpus

Enrichment of OntoSenseNet: Adding a Sense-annotated Telugu Lexicon

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation