Identifying Key Nodes for the Influence Spread Using a Machine Learning Approach
<p>The framework for identifying influential nodes.</p> "> Figure 2
<p>Smart bins—clustering discretization on the Facebook network. The dashed line marks the top 5% of the nodes. Nodes are colored according to the class they fit into, with the blue ones being top class (most influential nodes).</p> "> Figure 3
<p>Machine learning algorithm comparison with mean values aggregated across all the experiments.</p> "> Figure 4
<p>LightGBM performance in various tasks.</p> "> Figure 5
<p>Model generalization.</p> "> Figure 6
<p>Comparison of <span class="html-italic">Smart Bins</span>, clustering discretization and <span class="html-italic">Fixed bins</span>, an arbitrary choice of top 5% nodes.</p> "> Figure 7
<p>Feature importance.</p> ">
Abstract
:1. Introduction
- We introduce a novel node labelling approach based on unsupervised machine learning, Smart Bins—a significant improvement over techniques used in the modern relevant literature.
- We propose alternative ways to define influential nodes by selecting them on the basis of spreading peak and time, which have not yet been used.
- We examine the proposed framework’s ability to train a model that can generalize beyond the network used for its training.
- We conduct a broad feature importance analysis, which allows us to select a group of the most crucial centrality methods for key node identification and classification in further research.
2. Related Work
3. Framework Description
3.1. Estimating Nodes’ Influence
- Influence range: Predicting the total range, i.e., the number of influenced nodes in the network. This is the main task and is widely recognized in the relevant literature.
- Influence peak: Predicting the maximal number of nodes activated in one iteration. A crucial characteristic from a practical perspective (e.g., the total number of people infected by a virus at once is predicted, which allows us to approximate the workload for medical services).
- Peak time: Predicting the number of iterations required for the process to reach the Influence peak. An important value for real-life use cases.
3.2. Obtaining the Labels
3.3. Selecting the Features for Machine Learning Algorithms
4. Experiments
- Citeseer [38]—A citation network where edges represent citations between the papers;
- Pubmed [38]— A citation network where nodes represent publications about diabetes;
- Facebook [39]—A social network portraying verified pages from the Facebook platform, with likes between them as edges;
- Github [39]—A social network representing the following between developers, that have at least ten repositories.
4.1. Machine Learning Algorithm Evaluation
4.2. Model Generalization
4.3. Impact of Smart Binning
4.4. The Importance of Features
5. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Hou, L.; Liu, J.G.; Pan, X.; Wang, B.H. A social force evacuation model with the leadership effect. Phys. A Stat. Mech. Its Appl. 2014, 400, 93–99. [Google Scholar] [CrossRef]
- Hong, T.; Liu, Q. Seeds selection for spreading in a weighted cascade model. Phys. A Stat. Mech. Its Appl. 2019, 526, 120943. [Google Scholar] [CrossRef]
- Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
- Li, Y.; Cai, W.; Li, Y.; Du, X. Key node ranking in complex networks: A novel entropy and mutual information-based approach. Entropy 2019, 22, 52. [Google Scholar] [CrossRef]
- Kempe, D.; Kleinberg, J.; Tardos, É. Maximizing the spread of influence through a social network. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 24–27 August 2003; pp. 137–146. [Google Scholar]
- Ou, Y.; Guo, Q.; Liu, J. Identifying spreading influence nodes for social networks. Front. Eng. Manag. 2022, 9, 520–549. [Google Scholar] [CrossRef]
- Singh, S.S.; Srivastva, D.; Verma, M.; Singh, J. Influence maximization frameworks, performance, challenges and directions on social network: A theoretical study. J. King Saud Univ.-Comput. Inf. Sci. 2022, 34, 7570–7603. [Google Scholar] [CrossRef]
- Bucur, D. Top influencers can be identified universally by combining classical centralities. Sci. Rep. 2020, 10, 20550. [Google Scholar] [CrossRef]
- Rezaei, A.A.; Munoz, J.; Jalili, M.; Khayyam, H. A machine learning-based approach for vital node identification in complex networks. Expert Syst. Appl. 2023, 214, 119086. [Google Scholar] [CrossRef]
- Zhao, G.; Jia, P.; Huang, C.; Zhou, A.; Fang, Y. A machine learning based framework for identifying influential nodes in complex networks. IEEE Access 2020, 8, 65462–65471. [Google Scholar] [CrossRef]
- Manchanda, S.; Mittal, A.; Dhawan, A.; Medya, S.; Ranu, S.; Singh, A. Gcomb: Learning budget-constrained combinatorial algorithms over billion-sized graphs. Adv. Neural Inf. Process. Syst. 2020, 33, 20000–20011. [Google Scholar]
- Hussain, O.A.; Zaidi, F. Influence maximization in complex networks through supervised machine learning. In Complex Networks & Their Applications X: Volume 2, Proceedings of the Tenth International Conference on Complex Networks and Their Applications COMPLEX NETWORKS 2021 10; Springer International Publishing: Cham, Switzerland, 2022; pp. 217–228. [Google Scholar]
- Liu, C.; Fan, C.; Zhang, Z. Finding influencers in complex networks: An effective deep reinforcement learning approach. Comput. J. 2024, 67, 463–473. [Google Scholar] [CrossRef]
- Rashid, Y.; Bhat, J.I. Topological to deep learning era for identifying influencers in online social networks: A systematic review. Multimed. Tools Appl. 2024, 83, 14671–14714. [Google Scholar] [CrossRef]
- Freeman, L.C. Centrality in social networks: Conceptual clarification. In Social Network: Critical Concepts in Sociology; Routledge: Londres, UK, 2002; Volume 1, pp. 238–263. [Google Scholar]
- Zhang, J.; Zhang, Q.; Wu, L.; Zhang, J. Identifying influential nodes in complex networks based on multiple local attributes and information entropy. Entropy 2022, 24, 293. [Google Scholar] [CrossRef] [PubMed]
- Qiao, T.; Shan, W.; Zhou, C. How to identify the most powerful node in complex networks? A novel entropy centrality approach. Entropy 2017, 19, 614. [Google Scholar] [CrossRef]
- Namtirtha, A.; Dutta, A.; Dutta, B. Weighted kshell degree neighborhood method: An approach independent of completeness of global network structure for identifying the influential spreaders. In Proceedings of the 2018 10th International Conference on Communication Systems & Networks (COMSNETS), Bengaluru, India, 3–7 January 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 81–88. [Google Scholar]
- Silva, T.C.; Zhao, L. Network-based high level data classification. IEEE Trans. Neural Netw. Learn. Syst. 2012, 23, 954–970. [Google Scholar] [CrossRef]
- Wang, F.; She, J.; Ohyama, Y.; Wu, M. Deep-learning-based identification of influential spreaders in online social networks. In Proceedings of the IECON 2019-45th Annual Conference of the IEEE Industrial Electronics Society, Lisbon, Portugal, 14–17 October 2019; IEEE: Piscataway, NJ, USA, 2019; Volume 1, pp. 6854–6858. [Google Scholar]
- Tiukhova, E.; Penaloza, E.; Óskarsdóttir, M.; Garcia, H.; Bahnsen, A.C.; Baesens, B.; Snoeck, M.; Bravo, C. Influencer Detection with Dynamic Graph Neural Networks. arXiv 2022, arXiv:2211.09664. [Google Scholar]
- Tixier, A.J.P.; Rossi, M.E.G.; Malliaros, F.D.; Read, J.; Vazirgiannis, M. Perturb and combine to identify influential spreaders in real-world networks. In Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Vancouver, BC, Canada, 27–30 August 2019; pp. 73–80. [Google Scholar]
- Ullah, A.; Sheng, J.; Wang, B.; Din, S.U.; Khan, N. Leveraging neighborhood and path information for influential spreaders recognition in complex networks. J. Intell. Inf. Syst. 2024, 62, 377–401. [Google Scholar] [CrossRef]
- Ullah, A.; Shao, J.; Yang, Q.; Khan, N.; Bernard, C.M.; Kumar, R. LSS: A locality-based structure system to evaluate the spreader’s importance in social complex networks. Expert Syst. Appl. 2023, 228, 120326. [Google Scholar] [CrossRef]
- Lloyd, S. Least squares quantization in PCM. IEEE Trans. Inf. Theory 1982, 28, 129–137. [Google Scholar] [CrossRef]
- von Luxburg, U. A Tutorial on Spectral Clustering. CoRR 2007, 17, 395–416. [Google Scholar] [CrossRef]
- Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the KDD, Portland, Oregon, 2–4 August 1996; Volume 96, pp. 226–231. [Google Scholar]
- Freeman, L. Centrality in networks: I. conceptual clarifications. social networks. Soc. Netw. 1979, 10, 0378–8733. [Google Scholar]
- Brandes, U. On variants of shortest-path betweenness centrality and their generic computation. Soc. Netw. 2008, 30, 136–145. [Google Scholar] [CrossRef]
- Mones, E.; Vicsek, L.; Vicsek, T. Hierarchy measure for complex networks. PLoS ONE 2012, 7, e33799. [Google Scholar] [CrossRef]
- Zhang, J.X.; Chen, D.B.; Dong, Q.; Zhao, Z.D. Identifying a set of influential spreaders in complex networks. Sci. Rep. 2016, 6, 27823. [Google Scholar] [CrossRef] [PubMed]
- Newman, M.E. Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Phys. Rev. E 2001, 64, 016132. [Google Scholar] [CrossRef]
- Saramäki, J.; Kivelä, M.; Onnela, J.P.; Kaski, K.; Kertesz, J. Generalizations of the clustering coefficient to weighted complex networks. Phys. Rev. E 2007, 75, 027105. [Google Scholar] [CrossRef]
- Batagelj, V.; Zaversnik, M. An O (m) algorithm for cores decomposition of networks. arXiv 2003, arXiv:cs/0310049. [Google Scholar]
- Bonacich, P. Power and centrality: A family of measures. Am. J. Sociol. 1987, 92, 1170–1182. [Google Scholar] [CrossRef]
- Page, L.; Brin, S.; Motwani, R.; Winograd, T. The PageRank Citation Ranking: Bringing Order to the Web; Technical Report; Stanford InfoLab: Stanford, CA, USA, 1999. [Google Scholar]
- Boldi, P.; Vigna, S. Axioms for centrality. Internet Math. 2014, 10, 222–262. [Google Scholar] [CrossRef]
- Yang, Z.; Cohen, W.; Salakhudinov, R. Revisiting semi-supervised learning with graph embeddings. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 40–48. [Google Scholar]
- Rozemberczki, B.; Allen, C.; Sarkar, R. Multi-scale attributed node embedding. J. Complex Netw. 2021, 9, cnab014. [Google Scholar] [CrossRef]
- Sarker, I.H. Machine Learning: Algorithms, Real-World Applications and Research Directions. SN Comput. Sci. 2021, 2, 160. [Google Scholar] [CrossRef]
- Bródka, P.; Chmiel, A.; Magnani, M.; Ragozini, G. Quantifying layer similarity in multiplex networks: A systematic study. R. Soc. Open Sci. 2018, 5, 171747. [Google Scholar] [CrossRef] [PubMed]
- Aumann, R.J.; Shapley, L.S. Values of Non-Atomic Games; Princeton University Press: Princeton, NJ, USA, 1974. [Google Scholar]
- Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4765–4774. [Google Scholar]
Graph | Type | Nodes | Edges | Avg. Degree | Clustering Coefficient | Diameter | Transitivity |
---|---|---|---|---|---|---|---|
Citeseer | Cite | 3327 | 4536 | 2 | 0.07 | - | 0.13 |
Pubmed | Cite | 17,717 | 44,335 | 4 | 0.03 | 18 | 0.05 |
Social | 22,470 | 170,823 | 15 | 0.17 | 15 | 0.23 | |
Github | Social | 37,700 | 289,003 | 15 | 0.08 | 11 | 0.01 |
Graph | 2 bins | 3 bins | 4 bins | 5 bins |
---|---|---|---|---|
Citeseer | 4.170% | 1.387% | 0.837% | 0.157% |
Pubmed | 3.247% | 0.707% | 0.247% | 0.077% |
4.267% | 1.687% | 0.827% | 0.467% | |
Github | 10.397% | 4.337% | 2.477% | 1.727% |
Measure | Description |
---|---|
Degree | The number of connected edges to a particular node |
In-degree | The number of incoming edges from nearest neighbours |
Out-degree | The number of outcoming edges to nearest neighbours |
Average neighbour degree | Average degree of nearest neighbours |
Closeness [28] | Distance to other nodes |
Betweennes [29] | Fraction of all pairs’ shortest paths that pass through the node |
Local reaching [30] | It is a proportion of other nodes that are reachable from the node |
VoteRank [31] | Ranking of the nodes based on voting scheme |
Load [32] | The fraction of shortest paths that pass through the node |
Clustering coefficient [33] | The fraction of possible edges between node neighbours |
Core number [34] | It is the largest value of a maximal subgraph containing a node |
Eigenvector [35] | Computes the centrality for a node based on its neighbours’ centrality |
PageRank [36] | Computes a ranking of nodes based on the structure of the incoming edges |
Harmonic [37] | It is the sum of the reciprocal of the shortest path distances from the node to all others |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Stolarski, M.; Piróg, A.; Bródka, P. Identifying Key Nodes for the Influence Spread Using a Machine Learning Approach. Entropy 2024, 26, 955. https://doi.org/10.3390/e26110955
Stolarski M, Piróg A, Bródka P. Identifying Key Nodes for the Influence Spread Using a Machine Learning Approach. Entropy. 2024; 26(11):955. https://doi.org/10.3390/e26110955
Chicago/Turabian StyleStolarski, Mateusz, Adam Piróg, and Piotr Bródka. 2024. "Identifying Key Nodes for the Influence Spread Using a Machine Learning Approach" Entropy 26, no. 11: 955. https://doi.org/10.3390/e26110955