Abstract
With the easy availability of network data, there is an increasing need for extracting the compact and meaningful directed acyclic graph (DAG) patterns from network databases. There exists no specific method of sub-DAG pattern mining from the network databases with cyclic graphs. This paper presents a novel method, MH-DAGMiner, for mining maximal hierarchical sub-DAG patterns from a directed and weighted network database. To avoid the exhaustive steps of candidate generation, frequency counting and non-maximal patterns pruning, we propose a novel and effective FP-DAG structure based on edge grouping. We propose the optimum branching method to identify the vertex hierarchy within each maximal edge group generated from FP-DAG, and discover the maximal hierarchical sub-DAG patterns. The effectiveness of MH-DAGMiner is tested using several real-world network datasets and synthetic graph datasets. We analyzed the quality of MH-DAGMiner in comparison with a Naive-DAGMiner method which mines DAG patterns from a preprocessed cyclic graph (DAG) database. MH-DAGMiner is found to be more effective than Naive-DAGMiner with lower average information loss. MH-DAGMiner is also found to be more effective than the state-of-the-art benchmarking algorithms such as gSpan, FP-GraphMiner, and MFSH-TreeMiner where maximal hierarchical DAG patterns are identified with a post-processing step.














Similar content being viewed by others
Notes
In tree terminology, this is same as maximum spanning forest.
In tree terminology, this is same as maximum spanning tree.
Implementation is available via http://edmonds-alg.sourceforge.net/.
References
Aggarwal CC, Han J (eds) (2014) Frequent pattern mining. Springer, Berlin, pp 1–17
Bonchi F (2011) Influence propagation in social networks: a data mining perspective. IEEE Intell Inform Bull 12(1):8–105
Chen YL, Kao HP, Ko MT (2004) Mining DAG patterns from DAG databases. In: International conference on web-age information management, pp 579–588
Cooley R, Mobasher B, Srivastava J (1997) Web mining: information and pattern discovery on the world wide web. In: IEEE international conference on tools with artificial intelligence, pp 558–567
Edmonds J (1968) Optimum branchings. Math Decis Sci 1:335–345
Fariha A, Ahmed CF, Leung CK et al (2015) A new framework for mining frequent interaction patterns from meeting databases. Eng Appl Artif Intell 45:103–118
Gupte M, Shankar P, Li J et al (2011) Finding hierarchy in directed online social networks. In: International conference on world wide web WWW ’11, pp 557–566
Huan J, Wang W, Prins J (2003) Efficient mining of frequent subgraphs in the presence of isomorphism. In: IEEE international conference on data mining (ICDM), pp 549–552
Huan J, Wang W, Prins J et al (2004) SPIN: mining maximal frequent subgraphs from graph databases. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 581–586
Hwang S, Wei C, Yang W (2004) Discovery of temporal patterns from process instances. Comput Ind 53(3):345–364
Inokuchi A, Washio T, Motoda H (2000) An a priori-based algorithm for mining frequent substructures from graph data. In: European conference on principles of data mining and knowledge discovery, pp 13–23
Jiadong R, HuiFang W, Yue M et al (2015) Efficient software fault localization by hierarchical instrumentation and maximal frequent subgraph mining. Int J Innov Comput Inf Control 11(6):1897–1911
Jiang C, Coenen F, Zito M (2013) A survey of frequent subgraph mining algorithms. Knowl Eng Rev 28(1):75–105
Johnsonbaugh R, Kalin M (1991) A graph generation software package. ACM SIGCSE Bull 23(1):151–154
Kempe D, Kleinberg J, Tardos É (2003) Maximizing the spread of influence through a social network. In: ACM SIGKDD international conference on Knowledge discovery and data mining, pp 137–146
Leung CW (2010) Technical notes on extending gSpan to directed graphs. Technical Report, Management University, Singapore
Li Y, Lin Q, Zhong G, Duan D et al (2009) A directed labeled graph frequent pattern mining algorithm based on minimum code. In: International conference on multimedia and ubiquitous engineering, pp 353–359
Nijssen S, Kok J N (2004) A quickstart in frequent structure mining can make a difference. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 647–652
Sreenivasa GJ, Ananthanarayana VS (2006) Efficient mining of frequent rooted continuous directed subgraphs. In: International conference on advanced computing and communications (ADCOM), pp 553–558
Tarjan RE (1977) Finding optimum branchings. Networks 7(1):25–3
Termier A, Tamada Y, Numata K et al (2007) DIGDAG, a first algorithm to mine closed frequent embedded sub-DAGs. In: Mining and learning with graphs workshop (MLG’07), pp 41–45
Thomas LT, Valluri SR, Karlapalem K (2010) MARGIN: maximal frequent subgraph mining. ACM Trans Knowl Discov Data (TKDD) 4(3):10:1–10:42
Van der Aalst W, Weijters T, Maruster L (2004) Workflow mining: discovering process models from event logs. IEEE Trans Knowl Data Eng 16(9):1128–1142
Vijayalakshmi R, Rethnasamy N, Roddick JF et al (2001) FP-GraphMiner: a fast frequent pattern mining algorithm for network graphs. J Graph Algorithms Appl 15(6):753–776
Werth T, Dreweke A, Wörlein M et al (2008) DAGMA: mining directed acyclic graphs. In: IADIS European conference on data mining, pp 11–18
Yan X, Han J (2002) Gspan: graph-based substructure pattern mining. In: IEEE international conference on data mining (ICDM ’02), pp 721–724
Yan X, Han J (2003) CloseGraph: mining closed frequent graph patterns. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 286–295
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Tennakoon, T.M.G., Nayak, R. MH-DAGMiner: maximal hierarchical sub-DAG mining in directed weighted networks. Knowl Inf Syst 61, 431–462 (2019). https://doi.org/10.1007/s10115-018-1300-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-018-1300-0