On Leveraging Crowdsourcing Techniques for Schema Matching Networks

Nguyen Quoc Viet Hung²¹,
Nguyen Thanh Tam²¹,
Zoltán Miklós²² &
…
Karl Aberer²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7826))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1887 Accesses
21 Citations

Abstract

As the number of publicly-available datasets are likely to grow, the demand of establishing the links between these datasets is also getting higher and higher. For creating such links we need to match their schemas. Moreover, for using these datasets in meaningful ways, one often needs to match not only two, but several schemas. This matching process establishes a (potentially large) set of attribute correspondences between multiple schemas that constitute a schema matching network. Various commercial and academic schema matching tools have been developed to support this task. However, as the matching is inherently uncertain, the heuristic techniques adopted by these tools give rise to results that are not completely correct. Thus, in practice, a post-matching human expert effort is needed to obtain a correct set of attribute correspondences.

Addressing this problem, our paper demonstrates how to leverage crowdsourcing techniques to validate the generated correspondences. We design validation questions with contextual information that can effectively guide the crowd workers. We analyze how to reduce overall human effort needed for this validation task. Through theoretical and empirical results, we show that by harnessing natural constraints defined on top of the schema matching network, one can significantly reduce the necessary human work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Content Data Based Schema Matching

A study on machine learning techniques for the schema matching network problem

Article Open access 23 November 2021

PROCLAIM: An Unsupervised Approach to Discover Domain-Specific Attribute Matchings from Heterogeneous Sources

References

Aberer, K., Cudré-Mauroux, P., Hauswirth, M.: Start making sense: The Chatty Web approach for global semantic agreements. JWS, 89–114 (2003)
Google Scholar
von Ahn, L.: Human computation. In: DAC, pp. 418–419 (2009)
Google Scholar
von Ahn, L., Maurer, B., McMillen, C., Abraham, D., Blum, M.: recaptcha: Human-based character recognition via web security measures. Science, 1465–1468 (2008)
Google Scholar
Bernstein, P.A., Madhavan, J., Rahm, E.: Generic Schema Matching, Ten Years Later. PVLDB, 695–701 (2011)
Google Scholar
Chen, K.T., Wu, C.C., Chang, Y.C., Lei, C.L.: A crowdsourceable qoe evaluation framework for multimedia content. In: MM, pp. 491–500 (2009)
Google Scholar
Cudré-Mauroux, P., Aberer, K., Feher, A.: Probabilistic message passing in peer data management systems. In: ICDE, p. 41 (2006)
Google Scholar
Das Sarma, A., Fang, L., Gupta, N., Halevy, A., Lee, H., Wu, F., Xin, R., Yu, C.: Finding related tables. In: SIGMOD, pp. 817–828 (2012)
Google Scholar
Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the EM algorithm. J. R. Stat. Soc., 20–28 (1979)
Google Scholar
Di Lorenzo, G., Hacid, H., Paik, H.Y., Benatallah, B.: Data integration in mashups. In: SIGMOD, pp. 59–66 (2009)
Google Scholar
Do, H., Rahm, E.: COMA: a system for flexible combination of schema matching approaches. In: PVLDB, pp. 610–621 (2002)
Google Scholar
Duchateau, F., Coletta, R., Bellahsene, Z., Miller, R.J.: (Not) yet another matcher. In: CIKM. pp. 1537–1540 (2009)
Google Scholar
Gal, A., Sagi, T.: Tuning the ensemble selection process of schema matchers. JIS, 845–859 (2010)
Google Scholar
Gonzalez, H., Halevy, A.Y., Jensen, C.S., Langen, A., Madhavan, J., Shapley, R., Shen, W., Goldberg-Kidon, J.: Google fusion tables: web-centered data management and collaboration. In: SIGMOD, pp. 1061–1066 (2010)
Google Scholar
Jeffery, S.R., Franklin, M.J., Halevy, A.Y.: Pay-as-you-go user feedback for dataspace systems. In: SIGMOD, pp. 847–860 (2008)
Google Scholar
Lee, Y., Sayyadian, M., Doan, A., Rosenthal, A.S.: eTuner: tuning schema matching software using synthetic scenarios. JVLDB 16, 97–122 (2007)
Article Google Scholar
Madhavan, J., Bernstein, P.A., Doan, A., Halevy, A.: Corpus-based schema matching. In: ICDE, pp. 57–68 (2005)
Google Scholar
McCann, R., Shen, W.: Matching schemas in online communities: A web 2.0 approach. In: ICDE, pp. 110–119 (2008)
Google Scholar
Nguyen, H., Fuxman, A., Paparizos, S., Freire, J., Agrawal, R.: Synthesizing products for online catalogs. PVLDB, 409–418 (2011)
Google Scholar
Parameswaran, A.G., Garcia-Molina, H., Park, H., Polyzotis, N., Ramesh, A., Widom, J.: Crowdscreen: algorithms for filtering data with humans. In: SIGMOD, pp. 361–372 (2012)
Google Scholar
Peukert, E., Eberius, J., Rahm, E.: AMC - A framework for modelling and comparing matching systems as matching processes. In: ICDE, pp. 1304–1307 (2011)
Google Scholar
Qi, Y., Candan, K.S., Sapino, M.L.: Ficsr: feedback-based inconsistency resolution and query processing on misaligned data sources. In: SIGMOD, pp. 151–162 (2007)
Google Scholar
Rahm, E., Bernstein, P.A.: A Survey of Approaches to Automatic Schema Matching. JVLDB, 334–350 (2001)
Google Scholar
Sheng, V.S., Provost, F.: Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers. In: SIGKDD, pp. 614–622 (2008)
Google Scholar
Smith, K.P., Morse, M., Mork, P., Li, M., Rosenthal, A., Allen, D., Seligman, L., Wolf, C.: The role of schema matching in large enterprises. In: CIDR (2009)
Google Scholar
Su, W., Wang, J., Lochovsky, F.: Holistic schema matching for web query interfaces. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Böhm, K., Kemper, A., Grust, T., Böhm, C. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 77–94. Springer, Heidelberg (2006)
Chapter Google Scholar
Yan, T., Kumar, V.: CrowdSearch: exploiting crowds for accurate real-time image search on mobile phones. In: MobiSys, pp. 77–90 (2010)
Google Scholar
Zhang, H., Law, E., Miller, R., Gajos, K., Parkes, D., Horvitz, E.: Human computation tasks with global constraints. In: CHI, pp. 217–226 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

École Polytechnique Fédérale de Lausanne, Switzerland
Nguyen Quoc Viet Hung, Nguyen Thanh Tam & Karl Aberer
Université de Rennes 1, France
Zoltán Miklós

Authors

Nguyen Quoc Viet Hung
View author publications
You can also search for this author in PubMed Google Scholar
Nguyen Thanh Tam
View author publications
You can also search for this author in PubMed Google Scholar
Zoltán Miklós
View author publications
You can also search for this author in PubMed Google Scholar
Karl Aberer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Binghamton University, 13902, Binghamton, NY, USA
Weiyi Meng
Department of Computer Science and Technology, Tsinghua University, 100084, Beijing, China
Ling Feng
Department of Computer Science, National University of Singapore, 117417, Singapore
Stéphane Bressan
Research Group Data Analystics and Computing, University of Vienna, 1090, Vienna, Austria
Werner Winiwarter
School of Computer, Wuhan University, 430072, Wuhan, China
Wei Song

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hung, N.Q.V., Tam, N.T., Miklós, Z., Aberer, K. (2013). On Leveraging Crowdsourcing Techniques for Schema Matching Networks. In: Meng, W., Feng, L., Bressan, S., Winiwarter, W., Song, W. (eds) Database Systems for Advanced Applications. DASFAA 2013. Lecture Notes in Computer Science, vol 7826. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37450-0_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-37450-0_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37449-4
Online ISBN: 978-3-642-37450-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics