Efficient and scalable labeled subgraph matching using SGMatch

1053 Accesses
32 Citations
Explore all metrics

Abstract

Graphs are natural candidates for modeling application domains, such as social networks, pattern recognition, citation networks, or protein–protein interactions. One of the most challenging tasks in managing graphs is subgraph matching over data graphs, which attempts to find one-to-one correspondences, called solutions, among the query and data nodes. To compute solutions, most contemporary techniques use backtracking and recursion. An open research question is whether graphs can be matched based on parts and local solutions can be combined to reach a global matching. In this paper, we present an approach based on graph decomposition called SGMatch to match graphs. We represent graphs in smaller units called graphlets and develop a matching technique to leverage this representation. Pruning strategies use a new notion of edge covering called minimum hub cover and metadata, such as statistics and inverted indices, to reduce the number of matching candidates. Our evaluation of SGMatch versus contemporary algorithms, i.e., VF2, GraphQL, QuickSI, GADDI, or SPath, shows that SGMatch substantially improves the performance of current state-of-the-art techniques for larger query graphs with different structures, i.e., cliques, paths or subgraphs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ArcMatch: high-performance subgraph matching for labeled graphs by exploiting edge domains

Article Open access 07 August 2024

DLSM: Distance Label Based Subgraph Matching on GPU

Efficient structure similarity searches: a partition-based approach

Article 24 October 2017

References

Amin MS, Finley RL, Jamil HM (2012) Top-k similar graph matching using TraM in biological networks. IEEE/ACM Trans Comput Biol Bioinform 9(6):1790–1804
Article Google Scholar
Bonnici V, Giugno R, Pulvirenti A, Shasha D, Ferro A (2013) A subgraph isomorphism algorithm and its application to biochemical data. BMC Bioinform 14(S–7):S13
Article Google Scholar
Cordella LP, Foggia P, Sansone C, Vento M (2004) A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans Pattern Anal Mach Intell 26(10):1367–1372
Article Google Scholar
Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman, San Francisco
MATH Google Scholar
Han W-S, Lee J, Lee J-H (2013) Turbo$_{{\rm iso}}$: towards ultrafast and robust subgraph isomorphism search in large graph databases. In: SIGMOD conference, pp 337–348
He H, Singh AK (2006) Closure-tree: an index structure for graph queries. In: ICDE, p 38
He H, Singh AK(2008) Graphs-at-a-time: query language and access methods for graph databases. In: SIGMOD conference, pp 405–418
Jamil HM (2009) A novel knowledge representation framework for computing sub-graph isomorphic queries in interaction network databases. In ICTAI, pp 131–138
Jamil HM (2011) Computing subgraph isomorphic queries using structural unification and minimum graph structures. In SAC, pp 1053–1058
Lee J, Han W-S, Kasperovics R, Lee J-H (2012) An in-depth comparison of subgraph isomorphism algorithms in graph databases. PVLDB 6(2):133–144
Google Scholar
Leskovec J (2016) Stanford large network dataset collection. http://snap.stanford.edu/data/index.html
Messmer BT, Bunke H (2000) Efficient subgraph isomorphism detection: a decomposition approach. IEEE Trans Knowl Data Eng 12(2):307–323
Article Google Scholar
Prechelt L (2000) An empirical comparison of seven programming languages. IEEE Comput 33(10):23–29
Article Google Scholar
Prud’homme C, Fages J-G (2016) CHOCO solver: a Java library for CSP and CP. http://choco-solver.org/
Rivero CR, Hernández I, Ruiz D, Corchuelo R (2013) Benchmarking data exchange among semantic-web ontologies. IEEE Trans Knowl Data Eng 25(9):1997–2009
Article Google Scholar
Rivero CR, Hernández I, Ruiz D, Corchuelo R (2013) Exchanging data amongst linked data applications. Knowl Inf Syst 37(3):693–729
Article Google Scholar
Rivero CR, Jamil HM (2014) On matching graphs using disk-based processing: an XML and XQuery implementation. In: ICDE workshops, pp 20–27
Rivero CR, Jamil HM (2016) SGMatch: implementation, data and query sets, running scripts. http://dblab.nkn.uidaho.edu/SGMatch/
Shang H, Zhang Y, Lin X, Yu JX (2008) Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. PVLDB 1(1):364–375
Google Scholar
Sun Z, Wang H, Wang H, Shao B, Li J (2012) Efficient subgraph matching on billion node graphs. PVLDB 5(9):788–799
Google Scholar
Tian Y, Balmin A, Corsten SA, Tatikonda S, McPherson J (2013) From “think like a vertex” to “think like a graph”. PVLDB 7(3):193–204
Google Scholar
Tian Y, Patel JM(2008) TALE: a tool for approximate large graph matching. In: ICDE, pp 963–972
Ullmann JR (1976) An algorithm for subgraph isomorphism. J ACM 23(1):31–42
Article MathSciNet Google Scholar
Yelbay B, Birbil SI, Bülbül K, Jamil HM (2016) Approximating the minimum hub cover problem on planar graphs. Optim Lett 10(1):33–45
Article MathSciNet MATH Google Scholar
Zhang S, Li S, Yang J (2009) GADDI: distance index based subgraph matching in biological networks. In: EDBT, pp 192–203
Zhang S, Yang J, Jin W (2010) SAPPER: subgraph indexing and approximate matching in large graphs. PVLDB 3(1):1185–1194
Google Scholar
Zhao P, Han J (2010) On graph query optimization in large networks. PVLDB 3(1):340–351
Google Scholar
Zobel J, Moffat A (2006) Inverted files for text search engines. ACM Comput Surv 38(2), Article No 6. doi:10.1145/1132956.1132959
Zou Z, Li J, Gao H, Zhang S (2010) Mining frequent subgraph patterns from uncertain graph data. IEEE Trans Knowl Data Eng 22(9):1203–1218
Article Google Scholar

Download references

Acknowledgments

This work was completed while Carlos R. Rivero was working as a postdoctoral researcher at the University of Idaho. The ILP formulation discussed in Sect. 3.2 for the computation of minimum hub cover was contributed by Belma Yelbay. We would like to thank the anonymous reviewers and Stanisław P. Radziszowski for their valuable suggestions. This research was supported in part by the National Science Foundation grant DRL 1515550, Idaho National Laboratory grant FFK039, and a University of Idaho Office of Research and Economic Development grant.

Author information

Authors and Affiliations

Department of Computer Science, Rochester Institute of Technology, 102 Lomb Memorial Dr., Rochester, NY, 14623-5608, USA
Carlos R. Rivero
Department of Computer Science, University of Idaho, JEB Room 236, 875 Perimeter Drive, Moscow, ID, 83844-1010, USA
Hasan M. Jamil

Authors

Carlos R. Rivero
View author publications
You can also search for this author in PubMed Google Scholar
Hasan M. Jamil
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carlos R. Rivero.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rivero, C.R., Jamil, H.M. Efficient and scalable labeled subgraph matching using SGMatch. Knowl Inf Syst 51, 61–87 (2017). https://doi.org/10.1007/s10115-016-0968-2

Download citation

Received: 27 October 2015
Revised: 23 February 2016
Accepted: 24 June 2016
Published: 05 July 2016
Issue Date: April 2017
DOI: https://doi.org/10.1007/s10115-016-0968-2

Efficient and scalable labeled subgraph matching using SGMatch

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

ArcMatch: high-performance subgraph matching for labeled graphs by exploiting edge domains

DLSM: Distance Label Based Subgraph Matching on GPU

Efficient structure similarity searches: a partition-based approach

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Efficient and scalable labeled subgraph matching using SGMatch

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

ArcMatch: high-performance subgraph matching for labeled graphs by exploiting edge domains

DLSM: Distance Label Based Subgraph Matching on GPU

Efficient structure similarity searches: a partition-based approach

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation