Abstract
Graphs are natural candidates for modeling application domains, such as social networks, pattern recognition, citation networks, or protein–protein interactions. One of the most challenging tasks in managing graphs is subgraph matching over data graphs, which attempts to find one-to-one correspondences, called solutions, among the query and data nodes. To compute solutions, most contemporary techniques use backtracking and recursion. An open research question is whether graphs can be matched based on parts and local solutions can be combined to reach a global matching. In this paper, we present an approach based on graph decomposition called SGMatch to match graphs. We represent graphs in smaller units called graphlets and develop a matching technique to leverage this representation. Pruning strategies use a new notion of edge covering called minimum hub cover and metadata, such as statistics and inverted indices, to reduce the number of matching candidates. Our evaluation of SGMatch versus contemporary algorithms, i.e., VF2, GraphQL, QuickSI, GADDI, or SPath, shows that SGMatch substantially improves the performance of current state-of-the-art techniques for larger query graphs with different structures, i.e., cliques, paths or subgraphs.
Similar content being viewed by others
References
Amin MS, Finley RL, Jamil HM (2012) Top-k similar graph matching using TraM in biological networks. IEEE/ACM Trans Comput Biol Bioinform 9(6):1790–1804
Bonnici V, Giugno R, Pulvirenti A, Shasha D, Ferro A (2013) A subgraph isomorphism algorithm and its application to biochemical data. BMC Bioinform 14(S–7):S13
Cordella LP, Foggia P, Sansone C, Vento M (2004) A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans Pattern Anal Mach Intell 26(10):1367–1372
Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman, San Francisco
Han W-S, Lee J, Lee J-H (2013) Turbo\(_{{\rm iso}}\): towards ultrafast and robust subgraph isomorphism search in large graph databases. In: SIGMOD conference, pp 337–348
He H, Singh AK (2006) Closure-tree: an index structure for graph queries. In: ICDE, p 38
He H, Singh AK(2008) Graphs-at-a-time: query language and access methods for graph databases. In: SIGMOD conference, pp 405–418
Jamil HM (2009) A novel knowledge representation framework for computing sub-graph isomorphic queries in interaction network databases. In ICTAI, pp 131–138
Jamil HM (2011) Computing subgraph isomorphic queries using structural unification and minimum graph structures. In SAC, pp 1053–1058
Lee J, Han W-S, Kasperovics R, Lee J-H (2012) An in-depth comparison of subgraph isomorphism algorithms in graph databases. PVLDB 6(2):133–144
Leskovec J (2016) Stanford large network dataset collection. http://snap.stanford.edu/data/index.html
Messmer BT, Bunke H (2000) Efficient subgraph isomorphism detection: a decomposition approach. IEEE Trans Knowl Data Eng 12(2):307–323
Prechelt L (2000) An empirical comparison of seven programming languages. IEEE Comput 33(10):23–29
Prud’homme C, Fages J-G (2016) CHOCO solver: a Java library for CSP and CP. http://choco-solver.org/
Rivero CR, Hernández I, Ruiz D, Corchuelo R (2013) Benchmarking data exchange among semantic-web ontologies. IEEE Trans Knowl Data Eng 25(9):1997–2009
Rivero CR, Hernández I, Ruiz D, Corchuelo R (2013) Exchanging data amongst linked data applications. Knowl Inf Syst 37(3):693–729
Rivero CR, Jamil HM (2014) On matching graphs using disk-based processing: an XML and XQuery implementation. In: ICDE workshops, pp 20–27
Rivero CR, Jamil HM (2016) SGMatch: implementation, data and query sets, running scripts. http://dblab.nkn.uidaho.edu/SGMatch/
Shang H, Zhang Y, Lin X, Yu JX (2008) Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. PVLDB 1(1):364–375
Sun Z, Wang H, Wang H, Shao B, Li J (2012) Efficient subgraph matching on billion node graphs. PVLDB 5(9):788–799
Tian Y, Balmin A, Corsten SA, Tatikonda S, McPherson J (2013) From “think like a vertex” to “think like a graph”. PVLDB 7(3):193–204
Tian Y, Patel JM(2008) TALE: a tool for approximate large graph matching. In: ICDE, pp 963–972
Ullmann JR (1976) An algorithm for subgraph isomorphism. J ACM 23(1):31–42
Yelbay B, Birbil SI, Bülbül K, Jamil HM (2016) Approximating the minimum hub cover problem on planar graphs. Optim Lett 10(1):33–45
Zhang S, Li S, Yang J (2009) GADDI: distance index based subgraph matching in biological networks. In: EDBT, pp 192–203
Zhang S, Yang J, Jin W (2010) SAPPER: subgraph indexing and approximate matching in large graphs. PVLDB 3(1):1185–1194
Zhao P, Han J (2010) On graph query optimization in large networks. PVLDB 3(1):340–351
Zobel J, Moffat A (2006) Inverted files for text search engines. ACM Comput Surv 38(2), Article No 6. doi:10.1145/1132956.1132959
Zou Z, Li J, Gao H, Zhang S (2010) Mining frequent subgraph patterns from uncertain graph data. IEEE Trans Knowl Data Eng 22(9):1203–1218
Acknowledgments
This work was completed while Carlos R. Rivero was working as a postdoctoral researcher at the University of Idaho. The ILP formulation discussed in Sect. 3.2 for the computation of minimum hub cover was contributed by Belma Yelbay. We would like to thank the anonymous reviewers and Stanisław P. Radziszowski for their valuable suggestions. This research was supported in part by the National Science Foundation grant DRL 1515550, Idaho National Laboratory grant FFK039, and a University of Idaho Office of Research and Economic Development grant.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rivero, C.R., Jamil, H.M. Efficient and scalable labeled subgraph matching using SGMatch. Knowl Inf Syst 51, 61–87 (2017). https://doi.org/10.1007/s10115-016-0968-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-016-0968-2