[go: up one dir, main page]

Skip to main content

Towards Automatic Acquisition of a Fully Sense Tagged Corpus for Persian

  • Conference paper
Foundations of Intelligent Systems (ISMIS 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6804))

Included in the following conference series:

  • 3781 Accesses

Abstract

Sense tagged corpora play a crucial role in Natural Language Processing, particularly in Word Sense Disambiguation and Natural Language Understanding. Since semantic annotations are usually performed by humans, such corpora are limited to a handful of tagged texts and are not available for many languages with scarce resources including Persian. The shortage of efficient, reliable linguistic resources and fundamental text processing modules for Persian have been a challenge for researchers investigating this language. We employ a newly-proposed cross-lingual sense disambiguation algorithm to automatically create large sense tagged corpora. The initial evaluation of the tagged corpus indicates promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Sarrafzadeh, B., Yakovets, N., Cercone, N., An, A.: Cross lingual word sense disambiguation for languages with scarce resources. In: Proc. of The 24th Canadian Conference on Artificial Intelligence (2011)

    Google Scholar 

  2. Chklovski, T., Mihalcea, R.: Building a sense tagged corpus with open mind word expert. In: Proc. of the ACL 2002 Workshop on Word Sense Disambiguation: Recent Successes and Future Directions, vol. 8 (2002)

    Google Scholar 

  3. Miller, G.A., et.al: A semantic concordance. In: Proc. of the Workshop on Human Language Technology (1993)

    Google Scholar 

  4. Koeva, S., Lesseva, S., Todorova, M.: Bulgarian sense tagged corpus. In: Proc. of the Fifth International Conference Formal Approaches to South Slavic and Balkan Languages (2006)

    Google Scholar 

  5. Mihalcea, R., Moldovan, D.I.: An automatic method for generating sense tagged corpora. In: Proc. of the 16th National Conference on Artificial Intelligence and the 11th Innovative Applications of Artificial Intelligence Conference (1999)

    Google Scholar 

  6. Gale, W.A., Church, K.W., Yarowsky, D.: One sense per discourse. In: Proc. of the Workshop on Speech and Natural Language (1992)

    Google Scholar 

  7. de Melo, G., Weikum, G.: Extracting sense-disambiguated example sentences from parallel corpora. In: Proc. of the 1st WDE (2009)

    Google Scholar 

  8. Gliozzo, A.M., Ranieri, M.: Crossing parallel corpora and multilingual lexical databases for wsd. In: Proc. of the 6th International Conference on Intelligent Text Processing and Computational Linguistics (2005)

    Google Scholar 

  9. Makki, R., Homayounpour, M.: Word sense disambiguation of farsi homographs using thesaurus and corpus. In: Proc. of the 6th International Conference on Advances in Natural Language Processing (2008)

    Google Scholar 

  10. Soltani, M., Faili, H.: A statistical approach on persian word sense disambiguation. In: The 7th International Conference on INFOS (2010)

    Google Scholar 

  11. Shamsfard, M.: Semi automatic development of farsnet; the persian wordnet. In: Proc. of 5th Global WordNet Conference (2010)

    Google Scholar 

  12. Miangah, T.: Constructing a large-scale english-persian parallel corpus. Meta: Translators’ Journal (2009)

    Google Scholar 

  13. Pilevar, T., Faili, H.: Persiansmt: A first attempt to english-persian statistical machine translation. In: JADT (2010)

    Google Scholar 

  14. Shamsfard, M.: Step-1: Standard text preparation for persian language. In: Proc. of Machine Translation Summit XII (2009)

    Google Scholar 

  15. Pedersen, T., Kolhatkar, V.: Wordnet:senserelate:allwords: a broad coverage word sense tagger that maximizes semantic relatedness. In: Proc. of Human Language Technologies: NAACL, Companion Volume: Demonstration Session (2009)

    Google Scholar 

  16. Banerjee, S.: Extended gloss overlaps as a measure of semantic relatedness. In: Proc. of the 18th International Joint Conference on Artificial Intelligence (2003)

    Google Scholar 

  17. Specia, L., et.al.: An automatic approach to create a sense tagged corpus for word sense disambiguation in machine translation. In: Proc. of the 2nd Meaning Workshop (2005)

    Google Scholar 

  18. Resnik, P., Yarowsky, D.: Distinguishing systems and distinguishing senses: new evaluation methods for word sense disambiguation. Nat. Lang. Eng. (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sarrafzadeh, B., Yakovets, N., Cercone, N., An, A. (2011). Towards Automatic Acquisition of a Fully Sense Tagged Corpus for Persian. In: Kryszkiewicz, M., Rybinski, H., Skowron, A., RaÅ›, Z.W. (eds) Foundations of Intelligent Systems. ISMIS 2011. Lecture Notes in Computer Science(), vol 6804. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21916-0_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21916-0_48

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21915-3

  • Online ISBN: 978-3-642-21916-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics