[go: up one dir, main page]

Skip to main content

A Statistical Test of Heterogeneous Subgraph Densities to Assess Clusterability

  • Conference paper
  • First Online:
Learning and Intelligent Optimization (LION 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11968))

Included in the following conference series:

Abstract

Determining if a graph displays a clustered structure prior to subjecting it to any cluster detection technique has recently gained attention in the literature. Attempts to group graph vertices into clusters when a graph does not have a clustered structure is not only a waste of time; it will also lead to misleading conclusions. To address this problem, we introduce a novel statistical test, the \(\delta \)-test, which is based on comparisons of local and global densities. Our goal is to assess whether a given graph meets the necessary conditions to be meaningfully summarized by clusters of vertices. We empirically explore our test’s behavior under a number of graph structures. We also compare it to other recently published tests. From a theoretical standpoint, our test is more general, versatile and transparent than recently published competing techniques. It is based on the examination of intuitive quantities, applies equally to weighted and unweighted graphs and allows comparisons across graphs. More importantly, it does not rely on any distributional assumptions, other than the universally accepted definition of a clustered graph. Empirically, our test is shown to be more responsive to graph structure than other competing tests.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Also, note that in this article we assume undirected graphs with no self-loops.

References

  1. Albert, R., BarabĂ¡si, A.L.: Statistical mechanics of complex networks. Rev. Mod. Phys. 74, 47–97 (2002)

    Article  MathSciNet  Google Scholar 

  2. Aleskerov, F., Goldengorin, B., Pardalos, P.: Clusters, Orders, and Trees: Methods and Applications. Springer, Heidelberg (2014). https://doi.org/10.1007/978-1-4939-0742-7. Incorporated

    Book  MATH  Google Scholar 

  3. Alon, N., Shapira, A.: A characterization of the (natural) graph properties testable with one-sided error. SIAM J. Comput. 37(6), 1703–1727 (2008)

    Article  MathSciNet  Google Scholar 

  4. Arias-Castro, E., Verzelen, N.: Community detection in dense random networks. Ann. Statist. 42(3), 940–969 (2014). https://doi.org/10.1214/14-AOS1208

    Article  MathSciNet  MATH  Google Scholar 

  5. BarabĂ¡si, A.L., Albert, R.: Emergence of scaling in random networks. Science 286, 509–512 (1999)

    Article  MathSciNet  Google Scholar 

  6. Barrat, A., Barthélemy, M., Pastor-Satorras, R., Vespignani, A.: The architecture of complex weighted networks. Proc. Natl. Acad. Sci. 101, 3747–3752 (2004)

    Article  Google Scholar 

  7. Bickel, P.J., Sarkar, P.: Hypothesis testing for automated community detection in networks. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 78(1), 253–273 (2016). https://doi.org/10.1111/rssb.12117

    Article  MathSciNet  MATH  Google Scholar 

  8. Butenko, S., Chaovalitwongse, W.A., Pardalos, P.M.: Clustering Challenges in Biological Networks. World Scientific, Singapore (2009). https://doi.org/10.1142/6602

    Book  Google Scholar 

  9. Chiplunkar, A., Kapralov, M., Khanna, S., Mousavifar, A., Peres, Y.: Testing graph clusterability: algorithms and lower bounds. ArXiv e-prints, August 2018

    Google Scholar 

  10. Czumaj, A., Peng, P., Sohler, C.: Testing cluster structure of graphs. ArXiv e-prints, April 2015

    Google Scholar 

  11. Eden, T., Ron, D., Seshadhri, C.: On Approximating the number of \(k\)-cliques in sublinear time. ArXiv e-prints, March 2018

    Google Scholar 

  12. Elenberg, E.R., Shanmugam, K., Borokhovich, M., Dimakis, A.G.: Beyond triangles: a distributed framework for estimating 3-profiles of large graphs. ArXiv e-prints, June 2015

    Google Scholar 

  13. Erdös, P., Rényi, A.: On random graphs I. Publ. Math. Debr. 6, 290 (1959)

    MATH  Google Scholar 

  14. Fortunato, S.: Community detection in graphs. Phys. Rep. 486, 75–174 (2010)

    Article  MathSciNet  Google Scholar 

  15. Fortunato, S., Hric, D.: Community detection in networks: a user guide. ArXiv e-prints, November 2016

    Article  MathSciNet  Google Scholar 

  16. Fronczak, A., HoÅ‚yst, J.A., Jedynak, M., Sienkiewicz, J.: Higher order clustering coefficients in BarabĂ¡si-Albert networks. Phys. Stat. Mech. Its Appl. 316, 688–694 (2002)

    Article  Google Scholar 

  17. Gao, C., Lafferty, J.: Testing for global network structure using small subgraph statistics. ArXiv e-prints (Oct 2017)

    Google Scholar 

  18. Gao, C., Lafferty, J.: Testing network structure using relations between small subgraph probabilities. ArXiv e-prints, April 2017

    Google Scholar 

  19. Gishboliner, L., Shapira, A.: Deterministic vs non-deterministic graph property testing. ArXiv e-prints, April 2013

    Google Scholar 

  20. Goldreich, O., Ron, D.: Algorithmic aspects of property testing in the dense graphs model. SIAM J. Comput. 40(2), 376–445 (2011)

    Article  MathSciNet  Google Scholar 

  21. Hagberg, A., Schult, D., Swart, P.: Exploring network structure, dynamics, and function using network. In: Varoquaux, G., Vaught, T., Millman, J. (eds.) Proceedings of the 7th Python in Science Conference, Pasadena, CA USA, pp. 11–15 (2008)

    Google Scholar 

  22. He, Z., Liang, H., Chen, Z., Zhao, C.: Detecting statistically significant communities. CoRR abs/1806.05602 (2018). http://arxiv.org/abs/1806.05602

  23. Jin, J., Ke, Z.T., Luo, S.: Network global testing by counting graphlets. ArXiv e-prints, July 2018

    Google Scholar 

  24. Leskovec, J., Kleinberg, J., Faloutsos, C.: Graph evolution: Densification and shrinking diameters. ACM Trans. Knowl. Discov. Data 1(1), 2 (2007)

    Article  Google Scholar 

  25. LovĂ¡sz, L., Vesztergombi, K.: Nondeterministic graph property testing. Comb. Probab. Comput. 22, 749–762 (2013)

    Article  Google Scholar 

  26. Miasnikof, P., Shestopaloff, A.Y., Bonner, A.J., Lawryshyn, Y.: A statistical performance analysis of graph clustering algorithms. In: Bonato, A., Prałat, P., Raigorodskii, A. (eds.) WAW 2018. LNCS, vol. 10836, pp. 170–184. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-92871-5_11

    Chapter  Google Scholar 

  27. Ostroumova Prokhorenkova, L., Prałat, P., Raigorodskii, A.: Modularity of complex networks models. In: Bonato, A., Graham, F.C., Prałat, P. (eds.) WAW 2016. LNCS, vol. 10088, pp. 115–126. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49787-7_10

    Chapter  MATH  Google Scholar 

  28. Prokhorenkova, L.O., Prałat, P., Raigorodskii, A.: Modularity in several random graph models. Electron. Notes Discret. Math. 61, 947–953 (2017). http://www.sciencedirect.com/science/article/pii/S1571065317302238. The European Conference on Combinatorics, Graph Theory and Applications (EUROCOMB 2017)

  29. Prokhorenkova, L., Tikhonov, A.: Community detection through likelihood optimization: in search of a sound model. In: Proceedings of the 2019 World Wide Web Conference (WWW 2019) (2019)

    Google Scholar 

  30. Schaeffer, S.E.: Survey: graph clustering. Comput. Sci. Rev. 1(1), 27–64 (2007). https://doi.org/10.1016/j.cosrev.2007.05.001

    Article  MATH  Google Scholar 

  31. Ugander, J., Backstrom, L., Kleinberg, J.: Subgraph frequencies: mapping the empirical and extremal geography of large graph collections. ArXiv e-prints, April 2013

    Google Scholar 

  32. Verzelen, N., Arias-Castro, E.: Community detection in sparse random networks. Ann. Appl. Probab. 25(6), 3465–3510 (2015). https://doi.org/10.1214/14-AAP1080

    Article  MathSciNet  MATH  Google Scholar 

  33. Yang, J., Leskovec, J.: Defining and evaluating network communities based on Ground-truth. CoRR abs/1205.6233 (2012). http://arxiv.org/abs/1205.6233

  34. Yin, H., Benson, A.R., Leskovec, J.: Higher-order clustering in networks. ArXiv e-prints (2018)

    Google Scholar 

  35. Yin, H., Benson, A., Leskovec, J.: Local higher-order graph clustering. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2017 (2017)

    Google Scholar 

Download references

Acknowledgments

Pierre Miasnikof was supported by a Mitacs-Accelerate PhD award IT05806. He also wishes to thank Lasse Leskelä of Aalto University, for the introduction to the work of Gao and Lafferty. Liudmila Prokhorenkova and Andrei Raigorodskii were supported by The Russian Science Foundation (grant number 16-11-10014).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pierre Miasnikof .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Miasnikof, P., Prokhorenkova, L., Shestopaloff, A.Y., Raigorodskii, A. (2020). A Statistical Test of Heterogeneous Subgraph Densities to Assess Clusterability. In: Matsatsinis, N., Marinakis, Y., Pardalos, P. (eds) Learning and Intelligent Optimization. LION 2019. Lecture Notes in Computer Science(), vol 11968. Springer, Cham. https://doi.org/10.1007/978-3-030-38629-0_2

Download citation

Publish with us

Policies and ethics