A Statistical Test of Heterogeneous Subgraph Densities to Assess Clusterability

Pierre Miasnikof¹¹,
Liudmila Prokhorenkova^12,13,
Alexander Y. Shestopaloff¹⁴ &
…
Andrei Raigorodskii^12,13

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11968))

Included in the following conference series:

International Conference on Learning and Intelligent Optimization

880 Accesses
6 Citations

Abstract

Determining if a graph displays a clustered structure prior to subjecting it to any cluster detection technique has recently gained attention in the literature. Attempts to group graph vertices into clusters when a graph does not have a clustered structure is not only a waste of time; it will also lead to misleading conclusions. To address this problem, we introduce a novel statistical test, the $\delta $-test, which is based on comparisons of local and global densities. Our goal is to assess whether a given graph meets the necessary conditions to be meaningfully summarized by clusters of vertices. We empirically explore our test’s behavior under a number of graph structures. We also compare it to other recently published tests. From a theoretical standpoint, our test is more general, versatile and transparent than recently published competing techniques. It is based on the examination of intuitive quantities, applies equally to weighted and unweighted graphs and allows comparisons across graphs. More importantly, it does not rely on any distributional assumptions, other than the universally accepted definition of a clustered graph. Empirically, our test is shown to be more responsive to graph structure than other competing tests.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Statistical power, accuracy, reproducibility and robustness of a graph clusterability test

Article Open access 16 April 2023

Graph Clustering Via Intra-Cluster Density Maximization

A Statistical Performance Analysis of Graph Clustering Algorithms

Notes

1.
Also, note that in this article we assume undirected graphs with no self-loops.

References

Albert, R., Barabási, A.L.: Statistical mechanics of complex networks. Rev. Mod. Phys. 74, 47–97 (2002)
Article MathSciNet Google Scholar
Aleskerov, F., Goldengorin, B., Pardalos, P.: Clusters, Orders, and Trees: Methods and Applications. Springer, Heidelberg (2014). https://doi.org/10.1007/978-1-4939-0742-7. Incorporated
Book MATH Google Scholar
Alon, N., Shapira, A.: A characterization of the (natural) graph properties testable with one-sided error. SIAM J. Comput. 37(6), 1703–1727 (2008)
Article MathSciNet Google Scholar
Arias-Castro, E., Verzelen, N.: Community detection in dense random networks. Ann. Statist. 42(3), 940–969 (2014). https://doi.org/10.1214/14-AOS1208
Article MathSciNet MATH Google Scholar
Barabási, A.L., Albert, R.: Emergence of scaling in random networks. Science 286, 509–512 (1999)
Article MathSciNet Google Scholar
Barrat, A., Barthélemy, M., Pastor-Satorras, R., Vespignani, A.: The architecture of complex weighted networks. Proc. Natl. Acad. Sci. 101, 3747–3752 (2004)
Article Google Scholar
Bickel, P.J., Sarkar, P.: Hypothesis testing for automated community detection in networks. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 78(1), 253–273 (2016). https://doi.org/10.1111/rssb.12117
Article MathSciNet MATH Google Scholar
Butenko, S., Chaovalitwongse, W.A., Pardalos, P.M.: Clustering Challenges in Biological Networks. World Scientific, Singapore (2009). https://doi.org/10.1142/6602
Book Google Scholar
Chiplunkar, A., Kapralov, M., Khanna, S., Mousavifar, A., Peres, Y.: Testing graph clusterability: algorithms and lower bounds. ArXiv e-prints, August 2018
Google Scholar
Czumaj, A., Peng, P., Sohler, C.: Testing cluster structure of graphs. ArXiv e-prints, April 2015
Google Scholar
Eden, T., Ron, D., Seshadhri, C.: On Approximating the number of $k$-cliques in sublinear time. ArXiv e-prints, March 2018
Google Scholar
Elenberg, E.R., Shanmugam, K., Borokhovich, M., Dimakis, A.G.: Beyond triangles: a distributed framework for estimating 3-profiles of large graphs. ArXiv e-prints, June 2015
Google Scholar
Erdös, P., Rényi, A.: On random graphs I. Publ. Math. Debr. 6, 290 (1959)
MATH Google Scholar
Fortunato, S.: Community detection in graphs. Phys. Rep. 486, 75–174 (2010)
Article MathSciNet Google Scholar
Fortunato, S., Hric, D.: Community detection in networks: a user guide. ArXiv e-prints, November 2016
Article MathSciNet Google Scholar
Fronczak, A., Hołyst, J.A., Jedynak, M., Sienkiewicz, J.: Higher order clustering coefficients in Barabási-Albert networks. Phys. Stat. Mech. Its Appl. 316, 688–694 (2002)
Article Google Scholar
Gao, C., Lafferty, J.: Testing for global network structure using small subgraph statistics. ArXiv e-prints (Oct 2017)
Google Scholar
Gao, C., Lafferty, J.: Testing network structure using relations between small subgraph probabilities. ArXiv e-prints, April 2017
Google Scholar
Gishboliner, L., Shapira, A.: Deterministic vs non-deterministic graph property testing. ArXiv e-prints, April 2013
Google Scholar
Goldreich, O., Ron, D.: Algorithmic aspects of property testing in the dense graphs model. SIAM J. Comput. 40(2), 376–445 (2011)
Article MathSciNet Google Scholar
Hagberg, A., Schult, D., Swart, P.: Exploring network structure, dynamics, and function using network. In: Varoquaux, G., Vaught, T., Millman, J. (eds.) Proceedings of the 7th Python in Science Conference, Pasadena, CA USA, pp. 11–15 (2008)
Google Scholar
He, Z., Liang, H., Chen, Z., Zhao, C.: Detecting statistically significant communities. CoRR abs/1806.05602 (2018). http://arxiv.org/abs/1806.05602
Jin, J., Ke, Z.T., Luo, S.: Network global testing by counting graphlets. ArXiv e-prints, July 2018
Google Scholar
Leskovec, J., Kleinberg, J., Faloutsos, C.: Graph evolution: Densification and shrinking diameters. ACM Trans. Knowl. Discov. Data 1(1), 2 (2007)
Article Google Scholar
Lovász, L., Vesztergombi, K.: Nondeterministic graph property testing. Comb. Probab. Comput. 22, 749–762 (2013)
Article Google Scholar
Miasnikof, P., Shestopaloff, A.Y., Bonner, A.J., Lawryshyn, Y.: A statistical performance analysis of graph clustering algorithms. In: Bonato, A., Prałat, P., Raigorodskii, A. (eds.) WAW 2018. LNCS, vol. 10836, pp. 170–184. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-92871-5_11
Chapter Google Scholar
Ostroumova Prokhorenkova, L., Prałat, P., Raigorodskii, A.: Modularity of complex networks models. In: Bonato, A., Graham, F.C., Prałat, P. (eds.) WAW 2016. LNCS, vol. 10088, pp. 115–126. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49787-7_10
Chapter MATH Google Scholar
Prokhorenkova, L.O., Prałat, P., Raigorodskii, A.: Modularity in several random graph models. Electron. Notes Discret. Math. 61, 947–953 (2017). http://www.sciencedirect.com/science/article/pii/S1571065317302238. The European Conference on Combinatorics, Graph Theory and Applications (EUROCOMB 2017)
Prokhorenkova, L., Tikhonov, A.: Community detection through likelihood optimization: in search of a sound model. In: Proceedings of the 2019 World Wide Web Conference (WWW 2019) (2019)
Google Scholar
Schaeffer, S.E.: Survey: graph clustering. Comput. Sci. Rev. 1(1), 27–64 (2007). https://doi.org/10.1016/j.cosrev.2007.05.001
Article MATH Google Scholar
Ugander, J., Backstrom, L., Kleinberg, J.: Subgraph frequencies: mapping the empirical and extremal geography of large graph collections. ArXiv e-prints, April 2013
Google Scholar
Verzelen, N., Arias-Castro, E.: Community detection in sparse random networks. Ann. Appl. Probab. 25(6), 3465–3510 (2015). https://doi.org/10.1214/14-AAP1080
Article MathSciNet MATH Google Scholar
Yang, J., Leskovec, J.: Defining and evaluating network communities based on Ground-truth. CoRR abs/1205.6233 (2012). http://arxiv.org/abs/1205.6233
Yin, H., Benson, A.R., Leskovec, J.: Higher-order clustering in networks. ArXiv e-prints (2018)
Google Scholar
Yin, H., Benson, A., Leskovec, J.: Local higher-order graph clustering. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2017 (2017)
Google Scholar

Download references

Acknowledgments

Pierre Miasnikof was supported by a Mitacs-Accelerate PhD award IT05806. He also wishes to thank Lasse Leskelä of Aalto University, for the introduction to the work of Gao and Lafferty. Liudmila Prokhorenkova and Andrei Raigorodskii were supported by The Russian Science Foundation (grant number 16-11-10014).

Author information

Authors and Affiliations

Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Canada
Pierre Miasnikof
Moscow Institute of Physics and Technology, Dolgoprudny, Russia
Liudmila Prokhorenkova & Andrei Raigorodskii
Yandex, Moscow, Russia
Liudmila Prokhorenkova & Andrei Raigorodskii
The Alan Turing Institute, London, UK
Alexander Y. Shestopaloff

Authors

Pierre Miasnikof
View author publications
You can also search for this author in PubMed Google Scholar
Liudmila Prokhorenkova
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Y. Shestopaloff
View author publications
You can also search for this author in PubMed Google Scholar
Andrei Raigorodskii
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pierre Miasnikof .

Editor information

Editors and Affiliations

Technical University of Crete, Chania, Greece
Nikolaos F. Matsatsinis
Technical University of Crete, Chania, Greece
Yannis Marinakis
Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL, USA
Panos Pardalos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Miasnikof, P., Prokhorenkova, L., Shestopaloff, A.Y., Raigorodskii, A. (2020). A Statistical Test of Heterogeneous Subgraph Densities to Assess Clusterability. In: Matsatsinis, N., Marinakis, Y., Pardalos, P. (eds) Learning and Intelligent Optimization. LION 2019. Lecture Notes in Computer Science(), vol 11968. Springer, Cham. https://doi.org/10.1007/978-3-030-38629-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-38629-0_2
Published: 22 January 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-38628-3
Online ISBN: 978-3-030-38629-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

A Statistical Test of Heterogeneous Subgraph Densities to Assess Clusterability

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Statistical power, accuracy, reproducibility and robustness of a graph clusterability test

Graph Clustering Via Intra-Cluster Density Maximization

A Statistical Performance Analysis of Graph Clustering Algorithms

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Statistical Test of Heterogeneous Subgraph Densities to Assess Clusterability

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Statistical power, accuracy, reproducibility and robustness of a graph clusterability test

Graph Clustering Via Intra-Cluster Density Maximization

A Statistical Performance Analysis of Graph Clustering Algorithms

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation