[go: up one dir, main page]

Skip to main content

FastAGEDs: Fast Approximate Graph Entity Dependency Discovery

  • Conference paper
  • First Online:
Web Information Systems Engineering – WISE 2023 (WISE 2023)

Abstract

This paper studies the discovery of approximate rules in property graphs. First, we propose a semantically meaningful measure of error for mining graph entity dependencies (GEDs) that almost hold, to tolerate errors and inconsistencies that exist in real-world graphs. Second, we present a new characterisation of GED satisfaction, and devise a depth-first search strategy to traverse the search space of candidate GEDs efficiently. Further, we perform experiments to demonstrate the feasibility and scalability of our solution, FastAGEDs, with three real-world graphs. The results show FastAGEDs is effective and efficient for mining approximate GEDs in noisy and erroneous real-world graphs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    simply H when the context is clear.

  2. 2.

    the max. |adom(X)| of variable literals is \(2\), whereas that of constant literals is \(1\).

  3. 3.

    we adopt this metric due to its efficiency and anti-monotonic property.

  4. 4.

    for brevity, we represent items by their lowercase letters, e.g. \(a_1\) is \(A_1[x;EA]\), \(a_3^1\) is \(A_3[y;Soccer]\).

  5. 5.

    IMDB dataset. http://www.imdb.com/interfaces.

  6. 6.

    YAGO4 dataset. https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/.

  7. 7.

    DBLP dataset. https://dblp.uni-trier.de/xml/.

References

  1. Aberger, C.R., Lamb, A., Tu, S., Nötzli, A., Olukotun, K., Ré, C.: Emptyheaded: a relational engine for graph processing. ACM Trans. Database Syst. (TODS) 42(4), 1–44 (2017)

    Article  MathSciNet  Google Scholar 

  2. Alipourlangouri, M., Chiang, F.: Keyminer: discovering keys for graphs. In: VLDB Workshop TD-LSG (2018)

    Google Scholar 

  3. Bleifuß, T., et al.: Approximate discovery of functional dependencies for large datasets. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 1803–1812 (2016)

    Google Scholar 

  4. Bringmann, B., Nijssen, S.: What is frequent in a single graph? In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 858–863. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68125-0_84

    Chapter  Google Scholar 

  5. Caruccio, L., Deufemia, V., Polese, G.: Mining relaxed functional dependencies from data. Data Min. Knowl. Disc. 34(2), 443–477 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  6. Elseidy, M., Abdelhamid, E., Skiadopoulos, S., Kalnis, P.: GRAMI: frequent subgraph and pattern mining in a single large graph. Proc. VLDB Endowment 7(7), 517–528 (2014)

    Article  Google Scholar 

  7. Fan, W., Fan, Z., Tian, C., Dong, X.L.: Keys for graphs. Proc. VLDB Endowment 8(12), 1590–1601 (2015)

    Article  Google Scholar 

  8. Fan, W., Chunming, H., Liu, X., Ping, L.: Discovering graph functional dependencies. ACM Trans. Database Syst. (TODS) 45(3), 1–42 (2020)

    Article  MathSciNet  Google Scholar 

  9. Fan, W., Lu, P.: Dependencies for graphs. In: Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pp. 403–416 (2017)

    Google Scholar 

  10. Fan, W., Ping, L.: Dependencies for graphs. ACM Trans. Database Syst. (TODS) 44(2), 1–40 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  11. Fan, W., Ping, L., Tian, C., Zhou, J.: Deducing certain fixes to graphs. Proc. VLDB Endowment 12(7), 752–765 (2019)

    Article  Google Scholar 

  12. Fan, W., Wu, Y., Xu, J.: Functional dependencies for graphs. In: Proceedings of the 2016 International Conference on Management of Data, pp. 1843–1857 (2016)

    Google Scholar 

  13. Giannella, C., Robertson, E.: On approximation measures for functional dependencies. Inf. Syst. 29(6), 483–507 (2004)

    Article  Google Scholar 

  14. Golab, L., Karloff, H., Korn, F., Srivastava, D., Bei, Yu.: On generating near-optimal tableaux for conditional functional dependencies. Proc. VLDB Endowment 1(1), 376–390 (2008)

    Article  Google Scholar 

  15. Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: TANE: an efficient algorithm for discovering functional and approximate dependencies. Comput. J. 42(2), 100–111 (1999)

    Article  MATH  Google Scholar 

  16. Kivinen, J., Mannila, H.: Approximate inference of functional dependencies from relations. Theoret. Comput. Sci. 149(1), 129–149 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  17. Koudas, N., Saha, A., Srivastava, D., Venkatasubramanian, S.: Metric functional dependencies. In: 2009 IEEE 25th International Conference on Data Engineering, pp. 1275–1278. IEEE (2009)

    Google Scholar 

  18. Kramer, S., Pfahringer, B.: E cient search for strong partial determinations. In: Proceedings of the International Conference on Knowledge Discover and Data Mining, pp. 371–378. Citeseer (1996)

    Google Scholar 

  19. Kruse, S., Naumann, F.: Efficient discovery of approximate dependencies. Proc. VLDB Endowment 11(7), 759–772 (2018)

    Article  Google Scholar 

  20. Kwashie, S., Liu, J., Li, J., Ye, F.: Mining differential dependencies: a subspace clustering approach. In: Wang, H., Sharaf, M.A. (eds.) ADC 2014. LNCS, vol. 8506, pp. 50–61. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08608-8_5

    Chapter  Google Scholar 

  21. Kwashie, S., Liu, J., Li, J., Ye, F.: Conditional differential dependencies (CDDs). In: Morzy, T., Valduriez, P., Bellatreche, L. (eds.) ADBIS 2015. LNCS, vol. 9282, pp. 3–17. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23135-8_1

    Chapter  Google Scholar 

  22. Kwashie, S., Liu, J., Li, J., Ye, F.: Efficient discovery of differential dependencies through association rules mining. In: Sharaf, M.A., Cheema, M.A., Qi, J. (eds.) ADC 2015. LNCS, vol. 9093, pp. 3–15. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19548-3_1

    Chapter  Google Scholar 

  23. Kwashie, S., Liu, L., Liu, J., Stumptner, M., Li, J., Yang, L.: Certus: an effective entity resolution approach with graph differential dependencies (GDDs). Proc. VLDB Endowment 12(6), 653–666 (2019)

    Article  Google Scholar 

  24. Lin, P., Song, Q., Yinghui, W.: Fact checking in knowledge graphs with ontological subgraph patterns. Data Sci. Eng. 3(4), 341–358 (2018)

    Article  Google Scholar 

  25. Liu, D., et al.: An efficient approach for discovering graph entity dependencies (GEDs). arXiv preprint arXiv:2301.06264 (2023)

  26. Liu, J., Kwashie, S., Li, J., Ye, F., Vincent, M.: Discovery of approximate differential dependencies. arXiv preprint arXiv:1309.3733 (2013)

  27. Liu, J., Li, J., Liu, C., Chen, Y.: Discover dependencies from data-a review. IEEE Trans. Knowl. Data Eng. 24(2), 251–264 (2010)

    Article  Google Scholar 

  28. Ma, H., Alipourlangouri, M., Yinghui, W., Chiang, F., Pi, J.: Ontology-based entity matching in attributed graphs. Proc. VLDB Endowment 12(10), 1195–1207 (2019)

    Article  Google Scholar 

  29. Mandros, P., Boley, M., Vreeken, J.: Discovering reliable approximate functional dependencies. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 355–363 (2017)

    Google Scholar 

  30. Mannila, H., Räihä, K.J.: Dependency inference. In Proceedings of the 13th International Conference on Very Large Data Bases, pp. 155–158 (1987)

    Google Scholar 

  31. Mannila, H., Räihä, K.-J.: Algorithms for inferring functional dependencies from relations. Data Knowl. Eng. 12(1), 83–99 (1994)

    Article  MATH  Google Scholar 

  32. Medina, R., Nourine, L.: A unified hierarchy for functional dependencies, conditional functional dependencies and association rules. In: Ferré, S., Rudolph, S. (eds.) ICFCA 2009. LNCS (LNAI), vol. 5548, pp. 98–113. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01815-2_9

    Chapter  Google Scholar 

  33. Mhedhbi, A., Salihoglu, S.: Optimizing subgraph queries by combining binary and worst-case optimal joins. Proc. VLDB Endowment 12(11)

    Google Scholar 

  34. Piatetsky-Shapiro, G.: Probabilistic data dependencies. In: Machine Discovery Workshop (Aberdeen, Scotland) (1992)

    Google Scholar 

  35. Reimherr, M., Nicolae, D.L.: On quantifying dependence: a framework for developing interpretable measures (2013)

    Google Scholar 

  36. Teixeira, C.H., Fonseca, A.J., Serafini, M., Siganos, G., Zaki, M.J., Aboulnaga, A.: Arabesque: a system for distributed graph mining. In: Proceedings of the 25th Symposium on Operating Systems Principles, pp. 425–440 (2015)

    Google Scholar 

  37. Weytjens, S.: Approximate functional dependencies: a comparison of measures and a relevance focused tool for discovery (2021)

    Google Scholar 

Download references

Acknowledgment

This research project was supported in part by the following grant schemes: the Major Project of Hubei Hongshan Laboratory under Grant 2022HSZD031; the Innovation fund of Chinese Marine Defense Technology Innovation Center under Grant JJ-2021-722-04; the open funds of Hubei Three Gorges Laboratory, the Fundamental Research Funds for the Chinese Central Universities under Grant 2662023XXPY004, 2662022JC004; the open funds of the National Key Laboratory of Crop Genetic Improvement under Grant ZK202203, Huzhong Agricultural University; and the Inner Mongolia Key Scientific and Technological Project under Grant 2021SZD0099.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Shanmei Liu or Zaiwen Feng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhou, G. et al. (2023). FastAGEDs: Fast Approximate Graph Entity Dependency Discovery. In: Zhang, F., Wang, H., Barhamgi, M., Chen, L., Zhou, R. (eds) Web Information Systems Engineering – WISE 2023. WISE 2023. Lecture Notes in Computer Science, vol 14306. Springer, Singapore. https://doi.org/10.1007/978-981-99-7254-8_35

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-7254-8_35

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-7253-1

  • Online ISBN: 978-981-99-7254-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics