Abstract
A critical reality in integration is that knowledge obtained from different sources may often be conflicting. Conflict-resolution, whether performed during the design phase or during run-time, can be costly and, if done without a proper understanding of the usage context, can be ineffective. In this paper, we propose a novel exploration and feedback-based approach [FICSR (Pronounced as “fixer”)] to conflict-resolution when integrating metadata from different sources. Rather than relying on purely automated conflict-resolution mechanisms, FICSR brings the domain expert in the conflict-resolution process and informs the integration based on the expert’s feedback. In particular, instead of relying on traditional model based definition of consistency (which, whenever there are conflicts, picks a possible world among many), we introduce a ranked interpretation of the metadata and statements about the metadata. This not only enables FICSR to avoid committing to an interpretation too early, but also helps in achieving a more direct correspondence between the experts’ (subjective) interpretation of the data and the system’s (objective) treatment of the available alternatives. Consequently, the ranked interpretation leads to new opportunities for exploratory feedback for conflict-resolution: within the context of a given statement of interest, (a) a preliminary ranking of candidate matches, representing different resolutions of the conflicts, informs the user about the alternative interpretations of the metadata, while (b) user feedback regarding the preferences among alternatives is exploited to inform the system about the expert’s relevant domain knowledge. The expert’s feedback, then, is used for resolving not only the conflicts among different sources, but also possible mis-alignments due to the initial matching phase. To enable this \({(system \stackrel{_{informs}}{\longleftrightarrow} user)}\) feedback process, we develop data structures and algorithms for efficient off-line conflict/agreement analysis of the integrated metadata. We also develop algorithms for efficient on-line query processing, candidate result enumeration, validity analysis, and system feedback. The results are brought together and evaluated in the Feedback-based InConSistency Resolution (FICSR) system.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Alchourron C., Gardenfors P., Makinson D.: On the logic of theory change: partial meet contraction and revision functions. J. Symb. Log. 50(2), 531–543 (1985)
Arenas, M., Libkin, L.: XML data exchange: consistency and query answering. In: PODS, pp. 13–24 (2005)
Banjelloun, O., Sarma, A.D., Halevy, A., Widom, J.: ULDBs: databases with uncertainty and lineage. In: VLDB, pp. 953–964 (2006)
Bertossi L.: Consistent query answering in databases. SIGMOD Rec. 35(2), 68–76 (2006)
Biskup J.: A formal approach to null values in database relations. Adv. Database Theory 1, 299–341 (1979)
Biskup J.: Foundations of codd’s relational maybe operations. ACM Trans. Database Syst. 8(4), 608–636 (1983)
Bonifati, A., Chang, E., Lakshmanan, L.: Heptox: marrying XML and heterogeneity in your P2P databases. In: VLDB, pp. 1267–1270 (2005)
Boutilier, C., Brafman, R.I., Geib, C.: Structured reachability analysis for markov decision processes. In: UAI (1998)
Candan K.S., Grant J., Subrahmanian V.: A unified treatment of null values using constraints. Inf. Syst. J. 98(1–4), 99–156 (1997)
Candan K.S., Li W.-S., Priya M.L.: Similarity-based ranking and query processing in multimedia databases. Data Knowl. Eng. 35(3), 259–298 (2000)
Candan K.S., Kim J.W., Liu H., Suvarna R.: Discovering mappings in hierarchical data from multiple sources using the inherent structure. J. Knowl. Inf. Syst. 10(2), 185–210 (2006)
Chaudhuri, S., Gravano, L., Marian, A.: Optimizing top-k selection queries over multimedia repositories. TKDE 16 (2004)
Chiticariu, L., Kolaitis, P., Popa, L.: Interactive generation of integrated schemas. In: SIGMOD (2008)
Codd, E.F.: Understanding relations (Installment n. 7). In: FDT Bulletin of ACM-SIGMOD (1975)
Codd, E.F.: Extending the database relational model to capture more meaning. In: ACM TODS, vol. 4 (1979)
Conrad, S., Höding, M., Saake, G., Schmitt, I., Türker, C.: Schema integration with integrity constraints. In: Proceedings of British National Conference on Databases (BNCOD), pp. 200–214 (1997)
Cormen T.H., Leiserson C.E., Rivest R.L., Stein C.: Introduction to Algorithms. MIT Press, Cambridge (2001)
Domshlak C., Gal A., Roitman H.: Rank aggregation for automatic schema matching. TKDE 19(4), 538–553 (2007)
Doan, A., Domingos, P., Levy, A.Y.: Learning source description for data integration. In: WebDB (2000)
Dong, X.L., Halevy, A.Y., Yu, C.: Data integration with uncertainty. In: VLDB, pp. 687–698 (2007)
Doyle J.: A truth maintenance system. J. Artif. Intell. 12(3), 231–272 (1979)
Euzenat J., Shvaiko P.: Ontology Matching. Springer, Heidelberg (2007)
Fagin, R.: Combining fuzzy information from multiple systems. In: Proceedings of PODS, pp. 216–226 (1996)
Fagin, R.: Fuzzy queries in multimedia database systems. PODS98
Fagin R., Lotem A., Naor M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)
Flesca, S., Furfaro, F.,Greco, S., and Zumpano, E. Repairs and consistent answers for XML data with functional dependencies. In: XSYM, pp. 238–253 (2003)
Gal A., Anaby-Tavor A., Trombetta A., Montesi D.: A framework for modeling and evaluating automatic semantic reconciliation. VLDB J. 14(1), 50–67 (2005)
Gelfond, M., Lifschitz, V.: The stable model semantics for logic programming. In: ICSLP, pp. 1070–1080 (1988)
Grant J.: Incomplete information in a relational database. Fundamenta Informaticae 3(3), 363–378 (1980)
Grant J., Minker J.: Answering queries in indefinite databases and the null value problem. In: Kanellakis, P.(eds) Advances in Computing Research, vol. 3, JAI press Inc., Greenwich (1986)
Grant, J., Minker, J.: A logic-based approach to data integration. In: CoRR at ACM, DB/011032 (2001)
Haas, P., Wu, M., Xu, F., Jampani, R., Jermaine, C., Perez, L.: MCDB: A monte carlo approach to managing uncertain data. In: SIGMOD (2008)
Halevy, A., Ives, Z.G., Suciu, D., Tatarinov, I.: Schema mediation in peer data management. In: ICDE (2003)
Hernandez, M., Miller, R.J., Haas, L.: Clio: A semi-automatic tool for schema mapping. In: SIGMOD (2001)
Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking queries on uncertain data: a probabilistic threshold approach. In: SIGMOD (2008)
Ilyas, I.F., Aref, W.G., Elmagarmid, A.K.: Supporting top-k join queries in relational databases. In: VLDB03 (2003)
Imielinski, T., Lipski, W.: On representing incomplete information in a relational data base. In: VLDB (1981)
Imielinski T., Lipski W.: Incomplete information in relational databases. J. ACM 31(4), 761–791 (1984)
Jeffery, S., Franklin, M., Halevy, A.: Pay-as-you-go user feedback for dataspace systems. In: SIGMOD (2008)
Jhingran, A.: Enterprise information mashups: integrating information, simply. In: VLDB, pp. 3–4 (2006)
Kang, J., Han, T., Lee, D., Mitra, P.: Establishing value mappings using statistical models and user feedback. In: CIKM, pp. 68–75 (2005)
Kementsietsidis, A., Arenas, M., Miller, R.: Mapping data in peer-to-peer systems: Semantics and algorithmic issues. In: SIGMOD, pp. 325–336 (2003)
Lakshmanan V.S. et al.: ProbView: a flexible probabilistic database system. ACM TODS 22(3), 419–469 (1997)
Lenzerini, M.: Data integration: a theoretical perspective. In: Proceedings of PODS, pp. 233–246 (2002)
Li W.-S., Candan K.S., Hirata K., Hara Y.: Supporting efficient multimedia database exploration. VLDB J. 9(4), 312–326 (2001)
Li, C., Chang, K.C., Ilyas, I.F., Song, S.: Ranksql: query algebra and optimization for relational topá1k queries. In: SIGMOD (2006)
Liu, M., Ling, T.W.: A data model for semistructured data with partial and inconsistent information. In: EDBT, pp. 317–331 (2000)
Nakajima, H.: Development of efficient fuzzy SQL for large scale fuzzy relational database. In: IFSA (1993)
Madhavan, J., Bernstein, P.A., Rahm, E.: Generic schema matching with cupid. In: VLDB, pp. 49–58 (2001)
Mercer, R., Risch, V.: Properties of maximal cliques of a pair-wise compatibility graph for three nonmonotonic reasoning system. In: Proceedings of the Answer Set Programming (2003)
Miller, R., Haas, L., Hernandez, M.: Schema mapping as query discovery. In: VLDB, pp. 77–88 (2000)
Milo, T., Zohar, S.: Using schema matching to simplify heterogeneous data translation. In: VLDB, pp. 122–133 (1998)
Mitra, P., Wiederhold, G., Kersten, M.: A graph oriented model for articulation of ontology interdependencies. In: EDBT (2000)
Mitra, P., Wiederhold, G., Jannink, J.: Semi-automatic integration of knowledge sources. In: Proceedings of Fusion ’99, July (1999)
Moon J.W., Moser L.: On cliques in graphs. Isr. J. Math. 3, 23–28 (1965)
Ng, W.: Repairing inconsistent merged xml data. In: DEXA, pp. 244–255 (2003)
Özsu M.T., Blakeley J.A.: Query processing in object-oriented database systems. In: Kim, W.(eds) Modern Database Systems: The Object Model, Interoperability, and Beyond, pp. 146–174. ACM Press/Addison-Wesley, New York/Reading (1995)
Palopoli, L., Sacca, D., Ursino, D.: An automatic technique for detecting type conflicts in database schemes. In: CIKM (1998)
Pascoal, M., Martins, E.: A new implementation of Yen’s ranking loopless paths algorithm. 4OR—Quarterly Journal of the Belgian, French and Italian Operations Research Societies (2003)
Pottinger, R.A., Bernstein, P.A.: Merging models based on given correspondences. In: VLDB, pp. 826–873 (2003)
Qi, Y., Candan, K.S., Sapino, M.L., Kintigh, K.: Using QUEST for integrating taxonomies in the presence of misalignments and conflicts. In: SIGMOD, Demo, pp. 1153–1155 (2007)
Qi, Y., Candan, K.S., Sapino, M.L.: Feedback-based inconsistency resolution and query processing on misaligned data sources. In: ACM SIGMOD (2007)
Qi, Y., Candan, K.S., Sapino, M.L., Kintigh, K.: QUEST: query-driven exploration of semistructured data with conflicts and partial knowledge. In: VLDB Workshop on Clean Databases (CleanDB) (2006)
Qi, Y., Candan, K.S., Sapino, M.L.: Sum-max monotonic ranked joins for evaluating top-K twig queries on weighted data graphs. In: VLDB (2007)
Qi, Y., Candan, K.S., Tatemura, J., Chen, S., Liao, F.: Supporting OLAP operations over imperfectly integrated taxonomies. In: SIGMOD (2008)
Rahm E., Bernstein P.A.: A survey of approaches to automatic schema matching. VLDB J. 4(10), 334–350 (2001)
Reiter R.: A sound and sometimes complete query evaluation algorithm for relational databases with null values. J. Assoc. Comput. Mach. (JACM) 33(2), 349–370 (1986)
Russel S., Norvig P.: Artificial Intelligence: A Modern Approach. Prentice-Hall, Englewood Cliffs (1995)
Sarma, A.D., Dong, L., Halevy, A.: Bootstrapping pay-as-you-go data integration systems. In: SIGMOD (2008)
Taylor, N.E., Ives, Z.G.: Reconciling while tolerating disagreement in collaborative data sharing. In: SIGMOD, pp. 13–24 (2006)
Vermeer, M.W.W., Apers, P.M.G.: The role of integrity constraints in database interoperation. In: VLDB, pp. 425–435 (1996)
Yen J.Y.: Finding the k shortest loopless paths in a network. Manag. Sci. 17(11), 712–716 (1971)
Yu C.T., Luk W.S., Cheung T.Y.: A statistical model for relevance feedback in information retrieval. JACM 23(2), 273–286 (1976)
Zadeh L.: Fuzzy sets. Inf. Control 8(3), 338–353 (1965)
Zaniolo, C.: A unified semantics for active and deductive databases. In: RIDS, pp. 271–287 (1993)
XML Path Language (XPath) 2.0, W3C Recommendation 23 January 2007, http://www.w3.org/TR/xpath20/
Author information
Authors and Affiliations
Corresponding author
Additional information
This research has been funded with NSF Grant, AOC: Archaeological Data Integration for the Study of Long-Term Human and Social Dynamics, 2007–2009.
This work was done while the M. L. Sapino was at ASU for sabbatical.
Rights and permissions
About this article
Cite this article
Candan, K.S., Cao, H., Qi, Y. et al. System support for exploration and expert feedback in resolving conflicts during integration of metadata. The VLDB Journal 17, 1407–1444 (2008). https://doi.org/10.1007/s00778-008-0109-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-008-0109-y