Abstract
With the emergence of vast resources of information, it is necessary to develop methods that retrieve the most relevant information according to needs. These retrieval methods may benefit from natural language constructs to boost their results by achieving higher precision and recall rates. In this study, we have used part of speech properties of terms as extra source of information about document and query terms and have evaluated the impact of such data on the performance of the Persian retrieval algorithms. Furthermore the effect of stemming has been experimented as a complement to this research. Our findings indicate that part of speech tags may have small influence on effectiveness of the retrieved results. However, when this information is combined with stemming it improves the accuracy of the outcomes considerably.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Witten, I., Moffat, A., Bell, T.: Managing Gigabytes: Compressing and Indexing Documents and Images. IEEE Transactions on Information Theory 41(6) (1995)
Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proc. 19th ACM SIGIR, pp. 21–29. ACM, New York (1996)
Strohman, T., Metzler, D., Turtle, H., Croft, W.: Indri: A Language-Model Based Search Engine for Complex Queries. Technical Report IR-407, CIIR, UMass Amherst (2005)
Liddy, E.D.: Automatic Document Retrieval. Encyclopedia of Language and Linguistics. Elsevier Press, Amsterdam (2005)
Lewis, D., Jones, K.: Natural Language Processing for Information Retrieval. Communications of the ACM 39(1), 92–101 (1996)
Amiri, H., AleAhmad, A., Oroumchian, F., Lucas, C., Rahgozar, M.: Using OWA Fuzzy Operator to Merge Retrieval System Results. In: Computational Approaches to Arabic Script-based Languages (2007)
Amiri, H., Hojjat, H., Oroumchian, F.: Investigation on a Feasible Corpus for Persian POS Tagging. In: Proc. 12th International CSI Computer Conference, CSICC (2007)
Raja, F., Amiri, H., Tasharofi, S., Sarmadi, M., Hojjat, H., Oroumchian, F.: Evaluation of Part of Speech Tagging on Persian Text. In: The Second Workshop on Computational Approaches to Arabic Script-Based Languages, Stanford University, U.S.A (2007)
Mohtarami, M., Amiri, H., Oroumchian, F.: Using Heuristic Rules to Improve Persian Part of speech Tagging Accuracy. In: Proc. 6th International Conference on Informatics and Systems, INFOS 2006 (2006)
Oroumchian, F., Tasharofi, S., Amiri, H., Hojjat, H., Raja, F.: Creating a Feasible Corpus for Persian POS Tagging. Technical Report, No. TR3/06, University of Wollongong, Dubai Campus (2006)
Shah, C., Bombay, I.I.T., Mumbai, P., Maharashtra, I., Bhattacharyya, P.: A Study for Evaluating the Importance of Various Parts of Speech (POS) for Information Retrieval (IR). In: Proc. International Conference on Universal Knowledge and Languages, ICUKL (2002)
Carlberger, J., Kann, V.: Implementing an Efficient Part-Of-Speech Tagger. Software Practice and Experience 29(9), 815–832 (1999)
BijanKhan, M.: The Role of the Corpus in Writing a Grammar: An Introduction to a Software. Iranian Journal of Linguistics 19(2) (2004)
Turney, P., Littman, M.: Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus. National Research Council of Canada (2002)
Paik, W., Liddy, E., Yu, E., McKenna, M.: Interpretation of Proper Nouns for Information Retrieval. In: Proc. Workshop on Human Language Technology, pp. 309–313. Association for Computational Linguistics Morristown, NJ (1993)
Klavans, J.L., Kan, M.Y.: The Role of Verbs in Document Analysis. In: Proc. Coling-ACL, vol. 36, pp. 680–686. Association for Computational Linguistics (1998)
Brants, T.: TnT–a Statistical Part-of-Speech Tagger. In: Proc. 6th Conference on Applied Natural Language Processing (ANLP 2000), Seattle, WA, pp. 224–231 (2000)
Agirre, E., Nunzio, G.M.D., Ferro, N., Mandl, T., Peters, C.: CLEF 2008: Ad Hoc Track Overview. In: Peters, C., et al. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 15–37. Springer, Heidelberg (2009)
Aleahmad, A., Hakimian, P., Mahdikhani, F., Oroumchian, F.: N-gram and Local Context Analysis for Persian Text Retrieval. In: Proc. IEEE International Symposium on Signal Processing and its Applications, Sharjah, UAE, pp. 1–4 (2007)
Dehdari, J., Lonsdale, D.: A Link Grammar Parser for Persian. Aspects of Iranian Linguistics, vol. 1. Cambridge Scholars Press, Cambridge (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Karimpour, R. et al. (2009). Improving Persian Information Retrieval Systems Using Stemming and Part of Speech Tagging. In: Peters, C., et al. Evaluating Systems for Multilingual and Multimodal Information Access. CLEF 2008. Lecture Notes in Computer Science, vol 5706. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04447-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-04447-2_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04446-5
Online ISBN: 978-3-642-04447-2
eBook Packages: Computer ScienceComputer Science (R0)