Abstract
Our contribution is dedicated to geographic information contained in unstructured textual documents. The main focus of this article is to propose a general indexing strategy that is dedicated to spatial information, but which could be applied to temporal and thematic information as well. More specifically, we have developed a process flow that indexes the spatial information contained in textual documents. This process flow interprets spatial information and computes corresponding accurate footprints. Our goal is to normalize such heterogeneous grained and scaled spatial information (points, polylines, polygons). This normalization is carried out at the index level by grouping spatial information together within spatial areas and by using statistics to compute frequencies for such areas and weights for the retrieved documents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Mountains of the south west of France.
- 2.
Part of this project is supported by the Greater Pau City Council and the MIDR media library.
- 3.
E.g. for word “forgotten” the truncation returns “forgot”.
- 4.
E.g. for word “forgotten” the lemmatization returns “forget”.
References
Baccino T, Pynte J (1994) Spatial coding and discourse models during text reading. Lang Cogn Process 9:143–155
Cai G (2002) GeoVSM: an integrated retrieval model for geographic information. In: Egenhofer MJ, Mark DM (eds) GIScience. Lecture notes in computer science, vol 2478. Springer, Boulder, CO, USA, pp 65–79
Clough P, Joho H, Purves R (2006) Judging the spatial relevance of documents for GIR. In: ECIR’06: Proceedings of the 28th European conference on IR research, April 2006, Lecture notes in computer science, vol 3936. Springer, London, UK, pp 548–552
Egenhofer MJ (1991) Reasoning about Binary Topological Relations. In: Gunther O, Schek H-J (eds) SSD. Lecture notes in computer science, vol 525. Springer, Zürich, Switzerland, pp 143–160
Gaio M, Sallaberry C, Etcheverry P, Marquesuzaa C, Lesbegueries J (2008) A global process to access documents’ contents from a geographical point of view. J Vis Lang Comput 19(1):3–23
Glander T, Dollner J (2007) Cell-based generalization of 3D building groups with outlier management. In: Samet H, Shahabi C, Schneider M (eds) GIS. ACM, Seattle, WA, USA, p 54
Jones CB, Purves R (2006) GIR’05 2005 ACM workshop on geographical information retrieval. SIGIR Forum 40(1):34–37
Jones CB, Alani H, Tudhope D (2001) Geographical information retrieval with ontologies of place. In: Montello DR (ed) Proceedings of the conference on spatial information theory (COSIT 2001). Lecture notes in computer science, vol 2205. Springer, Heidelberg/Morro Bayand, pp 322–335
Kanhabua N, Nørvag K (2008) Improving temporal language models for determining time of non-timestamped documents. In: ECDL’08: Proceedings of the 12th European conference on research and advanced technology for digital libraries, Springer, Berlin/Heidelberg, pp 358–370
Le Parc-Lacayrelle A, Gaio M, Sallaberry C (2007) La composante temps dans l’information géographique textuelle. Revue Document Numérique 10(2):129–148
Li H, Srihari KR, Niu C, Li W (2002) Location normalization for information extraction. In: 19th international conference on computational linguistics (COLING 2002). Howard International House and Academia Sinica, Taipei, Association for Computational Linguistics
Mandl T, Gey FC, Nunzio GMD, Ferro N, Larson R, Sanderson M, Santos D, Womser-Hacker C, Xie X (2007) GeoCLEF 2007: the CLEF 2007 cross-language geographic information retrieval track overview. In: Peters C, Jijkoun V, Mandl T, Muller H, Oard DW, Penas A, Petras V, Santos D (eds) CLEF. Lecture notes in computer science, vol 5152. Springer, Budapest, Hungary, pp 745–772
Manning CD, Raghavan P, Schutze H (2008) Introduction to information retrieval. Cambridge University Press, New York
Marquesuzaà C, Etcheverry P, Lesbegueries J (2005) Exploiting geospatial markers to explore and resocialize localized documents. In: Rodriguez MA, Cruz IF, Egenhofer MJ, Levashkin S (eds) GeoS. Lecture notes in computer science, vol 3799. Springer, Mexico City, Mexico, pp 153–165
Martins B, Silva MJ, Andrade L (2005) Indexing and ranking in Geo-IR systems. In: GIR’05: Proceedings of the 2005 workshop on geographic information retrieval, ACM, New York, pp 31–34
Martins B, Manguinhas H, Borbinha JL (2008) Extracting and exploring the geo-temporal semantics of textual resources. In: Proceedings of the IEEE international conference on semantic computing. (ICSC’08), IEEE Computer Society, Washington, DC, USA, pp 1–9
Rees T (2003) “C-squares”, a new spatial indexing system and its applicability to the description of oceanographic datasets. Oceanography 16(1):11–19
Robbins S, Evans AC, Collins DL, Whitesides S (2003) Tuning and comparing spatial normalization methods. In: Ellis RE, Peters TM (eds) MICCAI (2). Lecture notes in computer science, vol 2879. Springer, Montréal, Canada, pp 910–917
Sallaberry C, Baziz M, Lesbegueries J, Gaio M (2007) Towards an IE and IR system dealing with spatial information in digital libraries – evaluation case study. In: ICEIS’07: Proceedings of the 9th international sonference on enterprise information systems, Funchal, Madeira, Portugal, pp 190–197
Salton G, McGill MJ (1983) Introduction to modern information retrieval. McGraw-Hill, New York, NY, USA
Sautter G, Bohm K, Padberg F, Tichy WF (2007) Empirical evaluation of semi-automated XML annotation of text documents with the GoldenGATE Editor. In: ECDL’07: Proceedings of the 11th European conference on digital libraries. Lecture notes in computer science, vol 4675. Springer, Budapest, Hungary, pp 357–367
Savoy J (2002) Morphologie et recherche d’information. Technical report, Institut interfacultaire d’informatique, Université de Neuchatel, Neuchatel
Sparck Jones K (1972) A statistical interpretation of term specificity and its application in retrieval. J Docum 28(1):11–21
Vaid S, Jones CB, Joho H, Sanderson M (2005) Spatio-textual indexing for geographical search on the web. In: Medeiros CB, Egenhofer MJ, Bertino E (eds) SSTD. Lecture notes in computer science, vol 3633. Springer, Angra dos Reis, Brazil, pp 218–235
Visser U (2004) Intelligent information integration for the semantic web. Springer, Heidelberg
Zhang Q (2005) Road network generalization based on connection analysis. In: Developments in spatial data handling. Springer, Berlin/Heidelberg, pp 343–353
Zhou S, Jones CB (2004) Shape-aware line generalisation with weighted effective area. In: Fisher PF (ed) Developments in spatial data handling 11th international symposium on spatial data handling. Springer, Kyoto, Japan, pp 369–380
Zhou X, Zhang Y, Lu S, Chen G (2000) On spatial information retrieval and database generalization. In: Proceedings of the Kyoto international conference on digital libraries. Kyoto, pp 380–386
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag GmbH Berlin Heidelberg
About this paper
Cite this paper
Palacio, D., Sallaberry, C., Gaio, M. (2012). Normalizing Spatial Information to Improve Geographical Information Indexing and Retrieval in Digital Libraries. In: Yeh, A., Shi, W., Leung, Y., Zhou, C. (eds) Advances in Spatial Data Handling and GIS. Lecture Notes in Geoinformation and Cartography. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25926-5_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-25926-5_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25925-8
Online ISBN: 978-3-642-25926-5
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)