Abstract
Most of existing approaches on XML keyword search focus on querying over a single data source. However, searching over hundreds or even thousands of (distributed) data sources by sequentially querying every single data source is extremely costly, thus it can be impractical. In this paper, we propose an approach for selecting top-k data sources to a given query in order to avoid the high cost of searching numerous, potentially irrelevant data sources. The proposed approach can efficiently select top-k mostly relevant data sources without querying over the data sources. We propose a ranking function for measuring the strength of correlation between keywords in a data source and summarize the data sources as keywords correlation graphs (K-Graphs). The top-k relevant data sources will be selected by estimating the relevance of corresponding K-Graphs to the query. Experimental results show that the approach achieves good performance with a variety of experimental parameters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Cohen, S., Kanza, Y., Kimelfeld, B., Sagiv, Y.: Interconnection semantics for keyword search in xml. In: Proceedings of CIKM, pp. 389–396. ACM, New York (2005)
Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: Xsearch: a semantic search engine for xml. In: Proceedings of VLDB Endowment, pp. 45–56 (2003)
Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: ranked keyword search over xml documents. In: Proceedings of SIGMOD, pp. 16–27. ACM, New York (2003)
Hristidis, V., Koudas, N., Papakonstantinou, Y., Srivastava, D.: Keyword Proximity Search in XML Trees. TKDE, 525–539 (2006)
Li, G., Feng, J., Wang, J., Zhou, L.: Effective keyword search for valuable lcas over xml documents. In: Proceedings of CIKM, pp. 31–40. ACM, New York (2007)
Liu, Z., Chen, Y.: Identifying meaningful return information for xml keyword search. In: Proceedings of SIGMOD, pp. 329–340. ACM, New York (2007)
Liu, Z., Walker, J., Chen, Y.: Xseek: a semantic xml search engine using keywords. In: Proceedings of VLDB Endowment, pp. 1330–1333 (2007)
Shao, F., Guo, L., Botev, C., Bhaskar, A., Chettiar, M., Yang, F., Shanmugasundaram, J.: Efficient keyword search over virtual xml views. In: Proceedings of VLDB Endowment, pp. 1057–1068 (2007)
Xu, Y., Papakonstantinou, Y.: Efficient keyword search for smallest lcas in xml databases. In: Proceedings of SIGMOD, pp. 527–538. ACM, New York (2005)
Sun, C., Chan, C.Y., Goenka, A.K.: Multiway slca-based keyword search in xml data. In: Proceedings of WWW, pp. 1043–1052. ACM, New York (2007)
Xu, Y., Papakonstantinou, Y.: Efficient lca based keyword search in xml data. In: Proceedings of EDBT, pp. 535–546. ACM, New York (2008)
Zhou, R., Liu, C., Li, J.: Fast elca computation for keyword queries on xml data. In: Proceedings of EDBT, pp. 549–560. ACM, New York (2010)
Hadjieleftheriou, M., Chandel, A., Koudas, N., Srivastava, D.: Fast indexes and algorithms for set similarity selection queries. In: Proceeding of ICDE, pp. 267–276. IEEE Computer Society, Washington, DC, USA (2008)
http://www.oracle.com/technology/products/berkeleydb/index.html
Chen, L.J., Papakonstantinou, Y.: Supporting top-k keyword search in xml databases. In: Proceeding of ICDE, pp. 689–700 (2010)
Powell, A.L., French, J.C.: Comparing the performance of collection selection algorithms. ACM Trans. Inf. Syst. 21, 412–456 (2003)
Gravano, L., GarcÃa-Molina, H., Tomasic, A.: Gloss: text-source discovery over the internet. ACM Trans. Database Syst. 24, 229–264 (1999)
Li, G., Feng, J., Wang, J., Zhou, L.: Effective keyword search for valuable lcas over xml documents. In: Proceedings of CIKM, pp. 31–40. ACM, New York (2007)
Bender, M.A., Farach-Colton, M., Pemmasani, G., Skiena, S., Sumazin, P.: Lowest common ancestors in trees and directed acyclic graphs. Journal of Algorithms 57, 75–94 (2005)
Li, Y., Yu, C., Jagadish, H.V.: Enabling schema-free xquery with meaningful query focus. The VLDB Journal 17(3), 355–377 (2008)
Bao, Z., Ling, T.W., Chen, B., Lu, J.: Effective xml keyword search with relevance oriented ranking. In: Proceedings of ICDE, Washington, DC, USA, pp. 517–528 (2009)
Gravano, L., GarcÃa-Molina, H., Tomasic, A.: Gloss: text-source discovery over the internet. ACM Trans. Database Syst. 24, 229–264 (1999)
Yuwono, B., Lee, D.L.: Server ranking for distributed text retrieval systems on the internet. In: Proceedings of DASFAA, pp. 41–50. World Scientific Press, Singapore (1997)
Callan, J.P., Lu, Z., Croft, W.B.: Searching distributed collections with inference networks. In: Proceedings of SIGIR, pp. 21–28. ACM, New York (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nguyen, K., Cao, J. (2011). K-Graphs: Selecting Top-k Data Sources for XML Keyword Queries. In: Hameurlain, A., Liddle, S.W., Schewe, KD., Zhou, X. (eds) Database and Expert Systems Applications. DEXA 2011. Lecture Notes in Computer Science, vol 6860. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23088-2_31
Download citation
DOI: https://doi.org/10.1007/978-3-642-23088-2_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23087-5
Online ISBN: 978-3-642-23088-2
eBook Packages: Computer ScienceComputer Science (R0)