Improving Persian Information Retrieval Systems Using Stemming and Part of Speech Tagging

Reza Karimpour²⁴,
Amineh Ghorbani²⁴,
Azadeh Pishdad²⁴,
Mitra Mohtarami²⁴,
Abolfazl AleAhmad²⁴,
Hadi Amiri²⁴ &
…
Farhad Oroumchian²⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5706))

Included in the following conference series:

Workshop of the Cross-Language Evaluation Forum for European Languages

641 Accesses

Abstract

With the emergence of vast resources of information, it is necessary to develop methods that retrieve the most relevant information according to needs. These retrieval methods may benefit from natural language constructs to boost their results by achieving higher precision and recall rates. In this study, we have used part of speech properties of terms as extra source of information about document and query terms and have evaluated the impact of such data on the performance of the Persian retrieval algorithms. Furthermore the effect of stemming has been experimented as a complement to this research. Our findings indicate that part of speech tags may have small influence on effectiveness of the retrieved results. However, when this information is combined with stemming it improves the accuracy of the outcomes considerably.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Stemming and Lemmatization for Information Retrieval Systems in Amazigh Language

Design and Implementation of Rule-Based Hindi Stemmer for Hindi Information Retrieval

Design and Development of a Dictionary Based Stemmer for Marathi Language

References

Witten, I., Moffat, A., Bell, T.: Managing Gigabytes: Compressing and Indexing Documents and Images. IEEE Transactions on Information Theory 41(6) (1995)
Google Scholar
Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proc. 19th ACM SIGIR, pp. 21–29. ACM, New York (1996)
Google Scholar
Strohman, T., Metzler, D., Turtle, H., Croft, W.: Indri: A Language-Model Based Search Engine for Complex Queries. Technical Report IR-407, CIIR, UMass Amherst (2005)
Google Scholar
Liddy, E.D.: Automatic Document Retrieval. Encyclopedia of Language and Linguistics. Elsevier Press, Amsterdam (2005)
Google Scholar
Lewis, D., Jones, K.: Natural Language Processing for Information Retrieval. Communications of the ACM 39(1), 92–101 (1996)
Article Google Scholar
Amiri, H., AleAhmad, A., Oroumchian, F., Lucas, C., Rahgozar, M.: Using OWA Fuzzy Operator to Merge Retrieval System Results. In: Computational Approaches to Arabic Script-based Languages (2007)
Google Scholar
Amiri, H., Hojjat, H., Oroumchian, F.: Investigation on a Feasible Corpus for Persian POS Tagging. In: Proc. 12th International CSI Computer Conference, CSICC (2007)
Google Scholar
Raja, F., Amiri, H., Tasharofi, S., Sarmadi, M., Hojjat, H., Oroumchian, F.: Evaluation of Part of Speech Tagging on Persian Text. In: The Second Workshop on Computational Approaches to Arabic Script-Based Languages, Stanford University, U.S.A (2007)
Google Scholar
Mohtarami, M., Amiri, H., Oroumchian, F.: Using Heuristic Rules to Improve Persian Part of speech Tagging Accuracy. In: Proc. 6th International Conference on Informatics and Systems, INFOS 2006 (2006)
Google Scholar
Oroumchian, F., Tasharofi, S., Amiri, H., Hojjat, H., Raja, F.: Creating a Feasible Corpus for Persian POS Tagging. Technical Report, No. TR3/06, University of Wollongong, Dubai Campus (2006)
Google Scholar
Shah, C., Bombay, I.I.T., Mumbai, P., Maharashtra, I., Bhattacharyya, P.: A Study for Evaluating the Importance of Various Parts of Speech (POS) for Information Retrieval (IR). In: Proc. International Conference on Universal Knowledge and Languages, ICUKL (2002)
Google Scholar
Carlberger, J., Kann, V.: Implementing an Efficient Part-Of-Speech Tagger. Software Practice and Experience 29(9), 815–832 (1999)
Article Google Scholar
BijanKhan, M.: The Role of the Corpus in Writing a Grammar: An Introduction to a Software. Iranian Journal of Linguistics 19(2) (2004)
Google Scholar
Turney, P., Littman, M.: Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus. National Research Council of Canada (2002)
Google Scholar
Paik, W., Liddy, E., Yu, E., McKenna, M.: Interpretation of Proper Nouns for Information Retrieval. In: Proc. Workshop on Human Language Technology, pp. 309–313. Association for Computational Linguistics Morristown, NJ (1993)
Chapter Google Scholar
Klavans, J.L., Kan, M.Y.: The Role of Verbs in Document Analysis. In: Proc. Coling-ACL, vol. 36, pp. 680–686. Association for Computational Linguistics (1998)
Google Scholar
Brants, T.: TnT–a Statistical Part-of-Speech Tagger. In: Proc. 6th Conference on Applied Natural Language Processing (ANLP 2000), Seattle, WA, pp. 224–231 (2000)
Google Scholar
Agirre, E., Nunzio, G.M.D., Ferro, N., Mandl, T., Peters, C.: CLEF 2008: Ad Hoc Track Overview. In: Peters, C., et al. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 15–37. Springer, Heidelberg (2009)
Google Scholar
Aleahmad, A., Hakimian, P., Mahdikhani, F., Oroumchian, F.: N-gram and Local Context Analysis for Persian Text Retrieval. In: Proc. IEEE International Symposium on Signal Processing and its Applications, Sharjah, UAE, pp. 1–4 (2007)
Google Scholar
Dehdari, J., Lonsdale, D.: A Link Grammar Parser for Persian. Aspects of Iranian Linguistics, vol. 1. Cambridge Scholars Press, Cambridge (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Electerical and Computer Engineering Faculty, University of Tehran, Iran
Reza Karimpour, Amineh Ghorbani, Azadeh Pishdad, Mitra Mohtarami, Abolfazl AleAhmad & Hadi Amiri
University of Wollongong in Dubai, United Arab Emirates
Farhad Oroumchian

Authors

Reza Karimpour
View author publications
You can also search for this author in PubMed Google Scholar
Amineh Ghorbani
View author publications
You can also search for this author in PubMed Google Scholar
Azadeh Pishdad
View author publications
You can also search for this author in PubMed Google Scholar
Mitra Mohtarami
View author publications
You can also search for this author in PubMed Google Scholar
Abolfazl AleAhmad
View author publications
You can also search for this author in PubMed Google Scholar
Hadi Amiri
View author publications
You can also search for this author in PubMed Google Scholar
Farhad Oroumchian
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Istituto di Scienza e Tecnologie dell’Informazione, CNR, Pisa, Italy
Carol Peters
RWTH Aachen University, Aachen, Germany
Thomas Deselaers
University of Padua, Padua, Italy
Nicola Ferro
LSI-UNED, Madrid, Spain
Julio Gonzalo & Anselmo Peñas &
Dublin City University, Dublin 9, Ireland
Gareth J. F. Jones
Helsinki University of Technology, Espoo, Finland
Mikko Kurimo
University of Hildesheim, Hildesheim, Germany
Thomas Mandl
Humboldt University Berlin, Germany
Vivien Petras

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Karimpour, R. et al. (2009). Improving Persian Information Retrieval Systems Using Stemming and Part of Speech Tagging. In: Peters, C., et al. Evaluating Systems for Multilingual and Multimodal Information Access. CLEF 2008. Lecture Notes in Computer Science, vol 5706. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04447-2_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-04447-2_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04446-5
Online ISBN: 978-3-642-04447-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving Persian Information Retrieval Systems Using Stemming and Part of Speech Tagging

Abstract

Access this chapter

Preview

Similar content being viewed by others

Stemming and Lemmatization for Information Retrieval Systems in Amazigh Language

Design and Implementation of Rule-Based Hindi Stemmer for Hindi Information Retrieval

Design and Development of a Dictionary Based Stemmer for Marathi Language

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Improving Persian Information Retrieval Systems Using Stemming and Part of Speech Tagging

Abstract

Access this chapter

Preview

Similar content being viewed by others

Stemming and Lemmatization for Information Retrieval Systems in Amazigh Language

Design and Implementation of Rule-Based Hindi Stemmer for Hindi Information Retrieval

Design and Development of a Dictionary Based Stemmer for Marathi Language

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation