Abstract
According to the structural balance theory, a signed graph is considered structurally balanced when it can be partitioned into a number of modules such that positive and negative edges are respectively located inside and between the modules. In practice, real-world networks are rarely structurally balanced, though. In this case, one may want to measure the magnitude of their imbalance, and to identify the set of edges causing this imbalance. The correlation clustering (CC) problem precisely consists in looking for the signed graph partition having the least imbalance. Recently, it has been shown that the space of the optimal solutions of the CC problem can be constituted of numerous and diverse optimal solutions. Yet, this space is difficult to explore, as the CC problem is NP-hard, and exact approaches do not scale well even when looking for a single optimal solution. To alleviate this issue, in this work we propose an efficient enumeration method allowing to retrieve the complete space of optimal solutions of the CC problem. It combines an exhaustive enumeration strategy with neighborhoods of varying sizes, to achieve computational effectiveness. Results obtained for middle-sized networks confirm the usefulness of our method.










Similar content being viewed by others
Notes
DOI: 10.6084/m9.figshare.15043911.
DOI:10.6084/m9.figshare.5700832.v5.
References
Ahuja, R.K., Ergun, O., Orlin, J.B., Punnen, A.P.: A survey of very large-scale neighborhood search techniques. Discret. Appl. Math. 123(1–3), 75–102 (2002). https://doi.org/10.1016/s0166-218x(01)00338-9
Ales, Z., Knippel, A., Pauchet, A.: Polyhedral combinatorics of the k-partitioning problem with representative variables. Discret. Appl. Math. 211, 1–14 (2016). https://doi.org/10.1016/j.dam.2016.04.002
Aref, S., Wilson, M.C.: Balance and frustration in signed networks. J. Complex Netw. 7(2), 163–189 (2018). https://doi.org/10.1093/comnet/cny015
Arınık, N., Figueiredo, R., Labatut, V.: Signed graph analysis for the interpretation of voting behavior. In: International Conference on Knowledge Technologies and Data-driven Business - International Workshop on Social Network Analysis and Digital Humanities (2017)
Arınık, N., Figueiredo, R., Labatut, V.: Multiple partitioning of multiplex signed networks: application to European parliament votes. Soc. Netw. 60, 83–102 (2020). https://doi.org/10.1016/j.socnet.2019.02.001
Arınık, N., Figueiredo, R., Labatut, V.: Multiplicity and diversity: analyzing the optimal solution space of the correlation clustering problem on complete signed graphs. J. Complex Netw. (2020). https://doi.org/10.1093/comnet/cnaa025
Arthur, J.L., Hachey, M., Sahr, K., Huso, M., Kiester, A.R.: Finding all optimal solutions to the reserve site selection problem. Environ. Ecol. Stat. 4(2), 153–165 (1997). https://doi.org/10.1023/a:1018570311399
Bansal, N., Blum, A., Chawla, S.: Correlation clustering. In: 43rd Annual IEEE Symposium on Foundations of Computer Science, pp. 238–247 (2002). https://doi.org/10.1109/SFCS.2002.1181947
Blum, C., Roli, A.: Metaheuristics in combinatorial optimization. ACM Comput. Surv. 35(3), 268–308 (2003). https://doi.org/10.1145/937503.937505
Brusco, M., Steinley, D.: K-balance partitioning: An exact method with applications to generalized structural balance and other psychological contexts. Psychol. Methods 15(2), 145–157 (2010). https://doi.org/10.1037/a0017738
Camm, J.D., Polasky, S., Solow, A., Csuti, B.: A note on optimal algorithms for reserve site selection. Biol. Cons. 78(3), 353–355 (1996). https://doi.org/10.1016/0006-3207(95)00132-8
Cartwright, D., Harary, F.: Structural balance: a generalization of Heider’s theory. Psychol. Rev. 63, 277–293 (1956). https://doi.org/10.1037/h0046049
Charikar, M., Gupta, N., Schwartz, R.: Local guarantees in graph cuts and clustering. In: Integer Programming and Combinatorial Optimization, pp. 136–147. Springer (2017). https://doi.org/10.1007/978-3-319-59250-3_12
Damaschke, P.: Fixed-parameter enumerability of cluster editing and related problems. Theory Comput. Syst. 46(2), 261–283 (2010). https://doi.org/10.1007/s00224-008-9130-1
Danna, E., Fenelon, M., Gu, Z., Wunderling, R.: Generating multiple solutions for mixed integer programming problems. In: International Conference on Integer Programming and Combinatorial Optimization, pp. 280–294. Springer (2007). https://doi.org/10.1007/978-3-540-72792-7_22
DasGupta, B., Enciso, G.A., Sontag, E., Zhang, Y.: Algorithmic and complexity results for decompositions of biological networks into monotone subsystems. Biosystems 9(1), 161–178 (2007). https://doi.org/10.1016/j.biosystems.2006.08.001
Davis, J.A.: Clustering and structural balance in graphs. Hum. Relat. 20(2), 181–187 (1967). https://doi.org/10.1177/001872676702000207
Demaine, E.D., Emanuel, D., Fiat, A., Immorlica, N.: Correlation clustering in general weighted graphs. Theoret. Comput. Sci. 361(2–3), 172–187 (2006). https://doi.org/10.1016/j.tcs.2006.05.008
Doreian, P., Lloyd, P., Mrvar, A.: Partitioning large signed two-mode networks: problems and prospects. Soc. Netw. 35(2), 178–203 (2013). https://doi.org/10.1016/j.socnet.2012.01.002
Doreian, P., Mrvar, A.: A partitioning approach to structural balance. Soc. Netw. 18(2), 149–168 (1996). https://doi.org/10.1016/0378-8733(95)00259-6
Doreian, P., Mrvar, A.: Structural balance and signed international relations. J. Soc. Struct. 16, 1 (2015). https://doi.org/10.21307/joss-2019-012
Esteban, J., Mayoral, L., Ray, D.: Ethnicity and conflict: an empirical study. Am. Econ. Rev. 102(4), 1310–1342 (2012). https://doi.org/10.1257/aer.102.4.1310
Figueiredo, R., Moura, G.: Mixed integer programming formulations for clustering problems related to structural balance. Soc. Netw. 35(4), 639–651 (2013). https://doi.org/10.1016/j.socnet.2013.09.002
Fischetti, M., Glover, F., Lodi, A.: The feasibility pump. Math. Program. 104(1), 91–104 (2005). https://doi.org/10.1007/s10107-004-0570-3
Fredman, M.L., Tarjan, R.E.: Fibonacci heaps and their uses in improved network optimization algorithms. J. ACM 34(3), 596–615 (1987). https://doi.org/10.1145/28869.28874
Good, B.H., de Montjoye, Y.A., Clauset, A.: Performance of modularity maximization in practical contexts. Phys. Rev. E 81(4), 046106 (2010). https://doi.org/10.1103/physreve.81.046106
Grötschel, M., Wakabayashi, Y.: A cutting plane algorithm for a clustering problem. Math. Program. 45(1–3), 59–96 (1989). https://doi.org/10.1007/bf01589097
Grötschel, M., Wakabayashi, Y.: Facets of the clique partitioning polytope. Math. Program. 47(1–3), 367–387 (1990). https://doi.org/10.1007/bf01580870
Gürsoy, F., Badur, B.: Extracting the signed backbone of intrinsically dense weighted networks. arXiv e-prints arXiv:2012.05216 (2020)
IBM: Ibm ilog cplex 12.8 user manual ibm corporation (2018)
Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd edn. Prentice-Hall, Hoboken (2000)
Keuper, M., Lukasik, J., Singh, M., Yarkony, J.: A benders decomposition approach to correlation clustering. In: The International Conference for High Performance Computing, Networking, Storage, and Analysis (2020)
Koschützki, D., Lehmann, K.A., Peeters, L., Richter, S., Tenfelde-Podehl, D., Zlotowski, O.: Centrality indices. In: Network Analysis: Methodological Foundations, Lecture Notes in Computer Science, vol. 3418, pp. 16–61 (2005). https://doi.org/10.1007/978-3-540-31955-9_3
Kuhn, H.W.: The Hungarian method for the assignment problem. Nav. Res. Logist. Q. 2(1–2), 83–97 (1955). https://doi.org/10.1002/nav.3800020109
Kunegis, J., Lommatzsch, A., Bauckhage, C.: The slashdot zoo. In: Proceedings of the 18th International Conference on World Wide Web, pp. 741–750. ACM Press (2009). https://doi.org/10.1145/1526709.1526809
Liu, X., Cheng, H.M., Zhang, Z.Y.: Evaluation of community detection methods. IEEE Trans. Knowl. Data Eng. (2019). https://doi.org/10.1109/tkde.2019.2911943
Ma, F., Hao, J.K.: A multiple search operator heuristic for the max-k-cut problem. Ann. Oper. Res. 248(1), 365–403 (2016). https://doi.org/10.1007/s10479-016-2234-0
Massa, P., Avesani, P.: Controversial users demand local trust metrics: an experimental study on epinions.com community. In: Proceedings of the 20th National Conference on Artificial Intelligence, vol. 1, pp. 121–126 (2005)
Mehrotra, A., Trick, M.A.: Cliques and clustering: a combinatorial approach. Oper. Res. Lett. 22(1), 1–12 (1998). https://doi.org/10.1016/s0167-6377(98)00006-6
Nemhauser, G., Wolsey, L.: Integer and Combinatorial Optimization. Wiley, New York (1999)
Nowozin, S., Jegelka, S.: Solution stability in linear programming relaxations. In: Proceedings of the 26th Annual International Conference on Machine Learning—ICML. ACM Press (2009). https://doi.org/10.1145/1553374.1553473
Oosten, M., Rutten, J.H.G.C., Spieksma, F.C.R.: The clique partitioning problem: facets and patching facets. Networks 38(4), 209–226 (2001). https://doi.org/10.1002/net.10004
Pevehouse, J., Nordstrom, T., Warnke, K.: The correlates of war 2 international governmental organizations data version 2.0. Confl. Manag. Peace Sci. 21(2), 101–119 (2004). https://doi.org/10.1080/07388940490463933
Queiroga, E., Subramanian, A., Figueiredo, R., Frota, Y.: Integer programming formulations and efficient local search for relaxed correlation clustering. J. Global Optim. (2021). https://doi.org/10.1007/s10898-020-00989-7
Tan, S., Lü, J.: An evolutionary game approach for determination of the structural conflicts in signed networks. Sci. Rep. 6(1), 22022 (2016). https://doi.org/10.1038/srep22022
Zaslavsky, T.: Balanced decompositions of a signed graph. J. Comb. Theory Ser. B 43(1), 1–13 (1987). https://doi.org/10.1016/0095-8956(87)90026-8
Acknowledgements
This research benefited from the support of Agorantic research federation (FR 3621), as well as the FMJH (Jacques Hadamard Mathematics Foundation) through PGMO (Gaspard Monge Program for Optimisation and operational research), and from the support to this program from EDF, Thales, Orange and Criteo.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
2-Partition and 2-chorded cycle inequalities
For the sake of completeness, we detail below the 2-partition and 2-chorded cycle inequalities:
-
Let \(S, T \subseteq V\) be two nonempty disjoint subsets of V. Then, the 2-partition inequality, illustrated in Fig. 11a, is defined as
$$\begin{aligned} \sum \limits _{u \in S} \sum \limits _{v \in T} x_{uv} - \sum _{\begin{array}{c} (u,v) \in S\\ u \ne v \end{array}} x_{uv} - \sum _{\begin{array}{c} (u,v) \in T\\ u \ne v \end{array}} x_{uv} \le min\{|S|,|T|\}. \end{aligned}$$ -
Let C \(\subseteq \) E be a cycle of length at least 5, and \(\overline{C} = \{v_i v_{i+2} | i=1,..,|C|-2\} \cup \{v_1 v_{|C|-1}, v_2 v_{|C|}\}\) be a 2-chorded cycle of C. Then, the 2-chorded cycle inequality, illustrated in Fig. 11b, is defined as
$$\begin{aligned} \sum \limits _{(u,v) \in C} x_{uv} - \sum \limits _{(u,v) \in \overline{C}} x_{uv} \le \lfloor \frac{|C|}{2} \rfloor . \end{aligned}$$
Edit distance between two membership vectors
Before calculating the edit distance between two membership vectors, we need to select one of them as the reference vector, in order to adapt the module assignments of the other vector based accordingly. Hence, the edit distance is calculated between the reference vector and this newly changed one, that we call relative vector.
The task of adapting the relative vector w.r.t the reference one can be expressed as an assignment problem, also known as maximum weighted bipartite matching problem, as already done in the literature [36]. Let \(\pi ^s\) and \(\pi ^t\) be two membership vectors of length n, associated with the partitions \(P^s\) and \(P^t\), which contain \(\ell ^s\) and \(\ell ^t\) modules, respectively. Also, since the edit distance is symmetric, without loss of generality, let \(\ell ^s \le \ell ^t\). Moreover, let CM be the \(\ell ^s\ \times \ \ell ^t\) confusion matrix of \(\pi ^s\) and \(\pi ^t\). The term \(CM_{ij}\), with \(1\le i \le \ell ^s\) and \(1\le j \le \ell ^t\), represents the number of vertices in the intersection of modules \(M^s_{i}\) and \(M^t_{j}\), i.e. \(|M^s_{i} \cap M^t_{j}|\). Then, we look for a bijection \(f: \{ 1, 2, \ldots , \ell ^s \} \rightarrow \{ 1, 2, \ldots , \ell ^t \}\) that maximizes the number of vertices common to pairs of modules from both membership vectors, i.e.
Since this problem can be modelled as an assignment or a maximum weighted bipartite matching problem, it can be solved in various ways. One of them is through the well-known Hungarian algorithm, whose complexity is \(O(n^3)\) [34]. Nevertheless, the best polynomial time algorithm is currently based on the network simplex method, and it runs in \(O(|V||E| + |V|^2 log(|V|))\) time using a Fibonacci heap data structure [25]. One final remark is about the case of \(\ell ^s < \ell ^t\), in which there are \(|\ell ^t - \ell ^s|\) unassigned module labels in \(\pi ^t\). In this case, one can arbitrarily renumber these labels, starting from \(\ell ^{s}+1\).
Finally, the edit distance between two membership vectors is calculated by simply counting the number of cases where the module labels of the vertices in the reference and relative vectors are different.
Proofs
1.1 All proofs related to the MVMO property for 3-edit operations on complete unweighted signed graphs
We complete the proof of Lemma 2 by verifying below the conditions of all atomic 3-edit operations for unweighted complete signed networks (illustrated in Fig. 12), where and \((u,v), (u,z), (v,z) \in \widetilde{E}\). Recall that
.
- a) :
-
We have \((\gamma ^{left}_{u} = a_{uv} + a_{uz}) > (\gamma ^{right}_{u} = -a_{uv} -a_{uz})\), \((\gamma ^{left}_{v} = a_{uv} + a_{vz}) > (\gamma ^{right}_{v} = -a_{uv} -a_{vz})\) and \((\gamma ^{left}_{z} = a_{uz} + a_{vz}) > (\gamma ^{right}_{z} = -a_{uz} -a_{vz})\). We see that \(a_{uv}\), \(a_{uz}\) and \(a_{vz}\) cannot be negative.
- b) :
-
We have \((\gamma ^{left}_{u} = a_{uv} + a_{uz}) > (\gamma ^{right}_{u} = 0)\), \((\gamma ^{left}_{v} = a_{uv} + a_{vz}) > (\gamma ^{right}_{v} = -a_{vz})\) and \((\gamma ^{left}_{z} = a_{uz} + a_{vz}) > (\gamma ^{right}_{z} = -a_{vz})\). We see that \(a_{uv}\), \(a_{uz}\) and \(a_{vz}\) cannot be negative.
- c) :
-
We have \((\gamma ^{left}_{u} = a_{uv} + a_{uz}) > (\gamma ^{right}_{u} = 0)\), \((\gamma ^{left}_{v} = a_{uv} + a_{vz}) > (\gamma ^{right}_{v} = 0)\) and \((\gamma ^{left}_{z} = a_{uz} + a_{vz}) > (\gamma ^{right}_{z} = 0)\). We see that \(a_{uv}\), \(a_{uz}\) and \(a_{vz}\) cannot be negative.
- d) :
-
We have \((\gamma ^{left}_{u} = a_{uv}) > (\gamma ^{right}_{u} = -a_{uv} -a_{uz})\), \((\gamma ^{left}_{v} = a_{uv}) > (\gamma ^{right}_{v} = -a_{uv} -a_{vz})\) and \((\gamma ^{left}_{z} = 0) > (\gamma ^{right}_{z} = -a_{uz} -a_{vz})\). We see that \(a_{uv}\), \(a_{uz}\) and \(a_{vz}\) must be positive.
- e) :
-
We have \((\gamma ^{left}_{u} = a_{uv} - a_{uz}) > (\gamma ^{right}_{u} = -a_{uv} + a_{uz})\), \((\gamma ^{left}_{v} = a_{uv} - a_{vz}) > (\gamma ^{right}_{v} = -a_{uv} + a_{vz})\) and \((\gamma ^{left}_{z} = -a_{uz} - a_{vz}) > (\gamma ^{right}_{z} = a_{uz} + a_{vz})\). We see that \(a_{uv}\) (resp. \(a_{uz}\) and \(a_{vz}\)) cannot be negative (resp. positive).
- f) :
-
We have \((\gamma ^{left}_{u} = a_{uv} - a_{uz}) > (\gamma ^{right}_{u} = -a_{uv})\), \((\gamma ^{left}_{v} = a_{uv} - a_{vz}) > (\gamma ^{right}_{v} = -a_{uv})\) and \((\gamma ^{left}_{z} = 0) > (\gamma ^{right}_{z} = a_{uz} + a_{vz})\). We see that \(a_{uv}\), \(a_{uz}\) and \(a_{vz}\) cannot be negative.
- g) :
-
We have \((\gamma ^{left}_{u} = a_{uv}) > (\gamma ^{right}_{u} = a_{uz} -a_{uv})\), \((\gamma ^{left}_{v} = a_{uv}) > (\gamma ^{right}_{v} = a_{vz} - a_{uv})\) and \((\gamma ^{left}_{z} = -a_{uz} - a_{vz}) > (\gamma ^{right}_{z} = 0)\). We see that \(a_{uv}\), \(a_{uz}\) and \(a_{vz}\) cannot be negative.
- h) :
-
We have \((\gamma ^{left}_{u} = a_{uv}) > (\gamma ^{right}_{u} = -a_{uv})\), \((\gamma ^{left}_{v} = a_{uv}) > (\gamma ^{right}_{v} = - a_{uv})\) and \((\gamma ^{left}_{z} = -a_{uz} - a_{vz}) > (\gamma ^{right}_{z} = 0)\). We see that \(a_{uv}\), \(a_{uz}\) and \(a_{vz}\) cannot be negative.
- i) :
-
We have \((\gamma ^{left}_{u} = a_{uv}) > (\gamma ^{right}_{u} = a_{uz})\), \((\gamma ^{left}_{v} = a_{uv} - a_{vz}) > (\gamma ^{right}_{v} = a_{vz})\) and \((\gamma ^{left}_{z} = -a_{uz} - a_{vz}) > (\gamma ^{right}_{z} = + a_{vz})\). We see that \(a_{uv}\) (resp. \(a_{uz}\) and \(a_{vz}\)) cannot be negative (resp. positive).
- j) :
-
We have \((\gamma ^{left}_{u} = a_{uv}) > (\gamma ^{right}_{u} = - a_{uz})\), \((\gamma ^{left}_{v} = a_{uv}) > (\gamma ^{right}_{v} = -a_{vz})\) and \((\gamma ^{left}_{z} = 0) > (\gamma ^{right}_{z} = -a_{uz} + a_{vz})\). We see that \(a_{uv}\) and \(a_{vz}\) (resp. \(a_{uz}\)) must be positive (resp. negative).
- k) :
-
We have \((\gamma ^{left}_{u} = a_{uv}) > (\gamma ^{right}_{u} = 0)\), \((\gamma ^{left}_{v} = a_{uv} - a_{vz}) > (\gamma ^{right}_{v} = 0)\) and \((\gamma ^{left}_{z} = 0) > (\gamma ^{right}_{z} = a_{vz})\). We see that \(a_{uv}\) and \(a_{vz}\) (resp. \(a_{uz}\)) must be positive (resp. negative).
- l) :
-
We have \((\gamma ^{left}_{u} = -a_{uv}) > (\gamma ^{right}_{u} = a_{uz})\), \((\gamma ^{left}_{v} = -a_{vz}) > (\gamma ^{right}_{v} = a_{uv})\) and \((\gamma ^{left}_{z} = -a_{uz}) > (\gamma ^{right}_{z} = a_{vz})\). We see that \(a_{uv}\) and \(a_{vz}\) (resp. \(a_{uz}\)) must be positive (resp. negative).
- m) :
-
We have \((\gamma ^{left}_{u} = -a_{uv}) > (\gamma ^{right}_{u} = 0)\), \((\gamma ^{left}_{v} = -a_{vz}) > (\gamma ^{right}_{v} = a_{uv})\) and \((\gamma ^{left}_{z} = 0) > (\gamma ^{right}_{z} = a_{vz})\). We see that \(a_{uv}\) and \(a_{vz}\) (resp. \(a_{uz}\)) must be positive (resp. negative).
- n) :
-
We have \((\gamma ^{left}_{u} = -a_{uv}) > (\gamma ^{right}_{u} = a_{uv} - a_{uz})\), \((\gamma ^{left}_{v} = -a_{uv}) > (\gamma ^{right}_{v} = a_{uv} + a_{vz})\) and \((\gamma ^{left}_{z} = -a_{vz}) > (\gamma ^{right}_{z} = -a_{uz})\). We see that \(a_{uv}\) (resp. \(a_{uz}\) and \(a_{vz}\)) cannot be negative (resp. positive).
- o) :
-
We have \((\gamma ^{left}_{u} = -a_{uv}) > (\gamma ^{right}_{u} = 0)\), \((\gamma ^{left}_{v} = 0) > (\gamma ^{right}_{v} = a_{uv} - a_{vz})\) and \((\gamma ^{left}_{z} = 0) > (\gamma ^{right}_{z} = -a_{vz})\). We see that \(a_{uv}\) (resp. \(a_{uz}\) and \(a_{vz}\)) cannot be negative (resp. positive).
- p) :
-
We have \((\gamma ^{left}_{u} = -a_{uv}) > (\gamma ^{right}_{u} = -a_{uz})\), \((\gamma ^{left}_{v} = 0) > (\gamma ^{right}_{v} = a_{uv} + a_{vz})\) and \((\gamma ^{left}_{z} = -a_{vz}) > (\gamma ^{right}_{z} = -a_{uz})\). We see that \(a_{uv}\) (resp. \(a_{uz}\) and \(a_{vz}\)) cannot be negative (resp. positive).
- r) :
-
We have \((\gamma ^{left}_{u} = 0) > (\gamma ^{right}_{u} = -a_{uv} - a_{uz})\), \((\gamma ^{left}_{v} = 0) > (\gamma ^{right}_{v} = -a_{uv} - a_{vz})\) and \((\gamma ^{left}_{z} = 0) > (\gamma ^{right}_{z} = -a_{uz} - a_{vz})\). We see that \(a_{uv}\), \(a_{uz}\) and \(a_{vz}\) must be positive.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Arınık, N., Figueiredo, R. & Labatut, V. Efficient enumeration of the optimal solutions to the correlation clustering problem. J Glob Optim 86, 355–391 (2023). https://doi.org/10.1007/s10898-023-01270-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10898-023-01270-3