[go: up one dir, main page]

Skip to main content

Improving Persian Information Retrieval Systems Using Stemming and Part of Speech Tagging

  • Conference paper
Evaluating Systems for Multilingual and Multimodal Information Access (CLEF 2008)

Abstract

With the emergence of vast resources of information, it is necessary to develop methods that retrieve the most relevant information according to needs. These retrieval methods may benefit from natural language constructs to boost their results by achieving higher precision and recall rates. In this study, we have used part of speech properties of terms as extra source of information about document and query terms and have evaluated the impact of such data on the performance of the Persian retrieval algorithms. Furthermore the effect of stemming has been experimented as a complement to this research. Our findings indicate that part of speech tags may have small influence on effectiveness of the retrieved results. However, when this information is combined with stemming it improves the accuracy of the outcomes considerably.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Witten, I., Moffat, A., Bell, T.: Managing Gigabytes: Compressing and Indexing Documents and Images. IEEE Transactions on Information Theory 41(6) (1995)

    Google Scholar 

  2. Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proc. 19th ACM SIGIR, pp. 21–29. ACM, New York (1996)

    Google Scholar 

  3. Strohman, T., Metzler, D., Turtle, H., Croft, W.: Indri: A Language-Model Based Search Engine for Complex Queries. Technical Report IR-407, CIIR, UMass Amherst (2005)

    Google Scholar 

  4. Liddy, E.D.: Automatic Document Retrieval. Encyclopedia of Language and Linguistics. Elsevier Press, Amsterdam (2005)

    Google Scholar 

  5. Lewis, D., Jones, K.: Natural Language Processing for Information Retrieval. Communications of the ACM 39(1), 92–101 (1996)

    Article  Google Scholar 

  6. Amiri, H., AleAhmad, A., Oroumchian, F., Lucas, C., Rahgozar, M.: Using OWA Fuzzy Operator to Merge Retrieval System Results. In: Computational Approaches to Arabic Script-based Languages (2007)

    Google Scholar 

  7. Amiri, H., Hojjat, H., Oroumchian, F.: Investigation on a Feasible Corpus for Persian POS Tagging. In: Proc. 12th International CSI Computer Conference, CSICC (2007)

    Google Scholar 

  8. Raja, F., Amiri, H., Tasharofi, S., Sarmadi, M., Hojjat, H., Oroumchian, F.: Evaluation of Part of Speech Tagging on Persian Text. In: The Second Workshop on Computational Approaches to Arabic Script-Based Languages, Stanford University, U.S.A (2007)

    Google Scholar 

  9. Mohtarami, M., Amiri, H., Oroumchian, F.: Using Heuristic Rules to Improve Persian Part of speech Tagging Accuracy. In: Proc. 6th International Conference on Informatics and Systems, INFOS 2006 (2006)

    Google Scholar 

  10. Oroumchian, F., Tasharofi, S., Amiri, H., Hojjat, H., Raja, F.: Creating a Feasible Corpus for Persian POS Tagging. Technical Report, No. TR3/06, University of Wollongong, Dubai Campus (2006)

    Google Scholar 

  11. Shah, C., Bombay, I.I.T., Mumbai, P., Maharashtra, I., Bhattacharyya, P.: A Study for Evaluating the Importance of Various Parts of Speech (POS) for Information Retrieval (IR). In: Proc. International Conference on Universal Knowledge and Languages, ICUKL (2002)

    Google Scholar 

  12. Carlberger, J., Kann, V.: Implementing an Efficient Part-Of-Speech Tagger. Software Practice and Experience 29(9), 815–832 (1999)

    Article  Google Scholar 

  13. BijanKhan, M.: The Role of the Corpus in Writing a Grammar: An Introduction to a Software. Iranian Journal of Linguistics 19(2) (2004)

    Google Scholar 

  14. Turney, P., Littman, M.: Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus. National Research Council of Canada (2002)

    Google Scholar 

  15. Paik, W., Liddy, E., Yu, E., McKenna, M.: Interpretation of Proper Nouns for Information Retrieval. In: Proc. Workshop on Human Language Technology, pp. 309–313. Association for Computational Linguistics Morristown, NJ (1993)

    Chapter  Google Scholar 

  16. Klavans, J.L., Kan, M.Y.: The Role of Verbs in Document Analysis. In: Proc. Coling-ACL, vol. 36, pp. 680–686. Association for Computational Linguistics (1998)

    Google Scholar 

  17. Brants, T.: TnT–a Statistical Part-of-Speech Tagger. In: Proc. 6th Conference on Applied Natural Language Processing (ANLP 2000), Seattle, WA, pp. 224–231 (2000)

    Google Scholar 

  18. Agirre, E., Nunzio, G.M.D., Ferro, N., Mandl, T., Peters, C.: CLEF 2008: Ad Hoc Track Overview. In: Peters, C., et al. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 15–37. Springer, Heidelberg (2009)

    Google Scholar 

  19. Aleahmad, A., Hakimian, P., Mahdikhani, F., Oroumchian, F.: N-gram and Local Context Analysis for Persian Text Retrieval. In: Proc. IEEE International Symposium on Signal Processing and its Applications, Sharjah, UAE, pp. 1–4 (2007)

    Google Scholar 

  20. Dehdari, J., Lonsdale, D.: A Link Grammar Parser for Persian. Aspects of Iranian Linguistics, vol. 1. Cambridge Scholars Press, Cambridge (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Karimpour, R. et al. (2009). Improving Persian Information Retrieval Systems Using Stemming and Part of Speech Tagging. In: Peters, C., et al. Evaluating Systems for Multilingual and Multimodal Information Access. CLEF 2008. Lecture Notes in Computer Science, vol 5706. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04447-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04447-2_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04446-5

  • Online ISBN: 978-3-642-04447-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics