A Survey of Deep Graph Clustering
A Survey of Deep Graph Clustering
Abstract—Graph clustering, which aims to divide nodes in the graph into several distinct clusters, is a fundamental yet challenging task.
Benefiting from the powerful representation capability of deep learning, deep graph clustering methods have achieved great success in
recent years. However, the corresponding survey paper is relatively scarce, and it is imminent to make a summary of this field. From
arXiv:2211.12875v4 [cs.LG] 12 Sep 2023
this motivation, we conduct a comprehensive survey of deep graph clustering. Firstly, we introduce formulaic definition, evaluation,
and development in this field. Secondly, the taxonomy of deep graph clustering methods is presented based on four different criteria,
including graph type, network architecture, learning paradigm, and clustering method. Thirdly, we carefully analyze the existing methods
via extensive experiments and summarize the challenges and2022/11/21
opportunities
14:18 from five perspectives, including taxonomy.drawio
graph data quality, stability,
scalability, discriminative capability, and unknown cluster number. Besides, the applications of deep graph clustering methods in six
Pure Structure Graph
domains, including computer vision, natural language processing, recommendation
Attribute Graph systems, social network analyses, Network MLP
bioinformatics,
Graph Type GNN
and medical science, are presented. Last but not least, thisHeterogeneous Graph
paper provides Architecture
open resource supports, including 1) a collection
Hybrid
Dynamic Graph
(https://github.com/yueliu1999/Awesome-Deep-Graph-Clustering) of state-of-the-art deep graph clustering methods (papers, codes,
and datasets) and 2) a unified framework (https://github.com/Marigoldwu/A-Unified-Framework-for-Deep-Attribute-Graph-Clustering)
of deep graph clustering. We hope this work can serve as a quick guide and help researchers overcome Taxonomychallenges
of in this vibrant field.
Deep Graph Clustering
Index Terms—Deep Graph Clustering, Graph Neural Network, Self-Supervised Learning, Clustering Analysis.
Reconstructive
Traditional Clustering Clustering Learning Adversarial
✦
Neural Clustering
Method Paradigm Contrastive
Hybrid
1 I NTRODUCTION
R aph clustering is an important and challenging task to is trained in a self-supervised manner to embed the nodes into the
G separate nodes of the graph into different clusters in an
unsupervised manner. In recent years, benefiting from the strong
latent space. After encoding, the clustering method C separates the
node embeddings Z into several disjoint clusters. The formulaic
representation capability of deep learning [1], especially graph definitions and the important baselines of deep graph clustering
neural networks (GNNs) [2], [3], [4], [5], [6], [7], [8], deep graph are deliberated in Section 2.
clustering has witnessed fruitful advances. However, unlike the
deep clustering area [9], [10], [11], [12], the survey paper of deep Graph
Heterogeneous
Neural
Graph
graph clustering [13] is relatively scarce. For example, [14], [15] Pure Network Hybrid
Structure Method
mainly survey the papers about community detection with deep Graph
Attribute Network
learning. To better assist researchers in reviewing, summarizing, Graph Type
Graph
Architecture
and planning for the future, a comprehensive survey of deep
graph clustering is expected. From this motivation, we make a Dynamic
Multi-layer
comprehensive survey of deep graph clustering in this paper. Graph Taxonomy of Perceptron
1 2 Neural
Reconstructive Adversarial
Clustering
3
3
Figure 2: The taxonomy of deep graph clustering.
Indiscriminative In this section, we first introduce the basic notation and the
formulaic definition of deep graph clustering. Then the critical
Stability Challenges in
Deep Graph Clustering deep graph clustering baselines are deliberated.
Discriminative
Stable
NMI: 79.60±0.50 Discriminative 2.1 Notation
ARI: 45.80±0.45
Capability
NMI: 80.34±5.21 Scalability The basic notations in this paper are summarized in Table 1. First,
ARI: 46.21±7.45
Unstable
we take the attribute graph as an example input of the deep graph
clustering methods. Given an attribute graph GA = {V, E, X},
1B Edges V = {v1 , v2 , . . . , vn } denotes a node set, which contains k
111M Nodes
classes of n nodes. Besides, E denotes a set of edges that link
Figure 3: The challenges in deep graph clustering. nodes. In addition, each node in the graph attaches d′ -dimension
′
attributes. In matrix form, X ∈ Rn×d denotes the node attribute
n×n
matrix and A ∈ R denotes the adjacency matrix. y ∈ Rn
Moreover, in recent years, deep graph clustering methods have
denotes ground truth labels of nodes. We introduce other types of
been successfully applied to numerous domains, such as social net-
the input graph in Section 3.1.
work analysis, recommendation systems, computer vision, natural
language processing, bioinformatics, medical science, etc. The
detailed information refers to Section 5. The main contributions 2.2 Task Definition
of this paper are summarized as follows. The target of deep graph clustering is to encode nodes in the
• We present a comprehensive survey paper in deep graph graph with the neural networks and divide them into different
clustering domain to help researchers review, summarize, clusters. Figure 1 demonstrates this process. In formulaic, the
solve challenges, and plan for future. self-supervised neural network F first encodes the nodes of the
• We design a taxonomy of recent deep graph clustering attribute graph GA as follows.
methods based on four aspects, i.e., graph type, network
Z = F(GA ) = F(X, A), (1)
architecture, learning paradigm, and clustering method.
′
• We conduct extensive experiments to summarize the chal- where X ∈ Rn×d denotes the node attribute matrix, A ∈ Rn×n
lenges in the deep graph clustering field from five per-
about:blank
denotes
1/1
the adjacency matrix, and Z ∈ Rn×d denotes the learned
spectives, including graph data quality, stability, scalability, node embeddings. Generally, the self-supervised neural network
discriminative capability, and unknown cluster number. Be- F is trained with pre-text tasks like structure reconstruction,
sides, through the careful analyses, we provide the potential attribute reconstruction, contrastive learning, adversarial learning,
promising technical solutions. etc. Details about network architecture and learning paradigm
• We share two practical open resources, including a collec- are introduced in Section 3.2 and Section 3.3, respectively. After
tion of state-of-the-art deep graph clustering methods and a encoding, the clustering method C groups the nodes into several
unified framework of deep graph clustering. disjoint clusters as follows.
3
Stage Ⅴ & Ⅵ
Reconstructive Contrastive Scalability & Complex Scenses
Stage Ⅳ
DMoN
Adversarial Other Contrastive Learning
Stage Ⅲ
CCGC
Data Fusion
CONGRE
MinCutPool S3GC
GATE
Stage Ⅱ
MVGRL MCGC DCRN CARL-G
Adversarial Learning
Community
AGE HeCo SUBLIME HSAN
GAN
Stage Ⅰ
Graph Clustering via DNNs ProGAN MAGCN SGCMC AFGRL Dink-Net
Graph
HNE DNGR MGAE ARGA DAEGC SDCN DFCN R-GAE DGCN
Encoder
2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 4: Time line of the important baselines in deep graph clustering field.
some graph pooling methods [36], [37], [38] also contribute to 2.4 Evaluation
this field. Then, O2MAC [39] and MAGCN [40] make attempts This section introduces the evaluation of deep graph clustering.
to exploit deep neural networks for attributed multi-view graph Deep graph clustering is a purely unsupervised task. Therefore
clustering via the multiple graph view reconstruction and the it is hard to evaluate the clustering performance of deep graph
view-consistency information learning, respectively. Besides, a clustering methods without the ground truth. Generally, an ex-
self-supervised method SGCMC [41] utilizes the clustering labels cellent deep graph clustering algorithm can learn the clustering
to guide the network learning, thus improving clustering perfor- distribution that has small with-cluster variance and significant
mance. R-GAE [42] rethink the graph-auto-encoder-based deep between-cluster variance. The mainstream metrics can be cate-
graph clustering methods by considering feature randomness and gorized into two classes [76], i.e., extrinsic metrics and intrinsic
feature drift. And scTAG [43] apply the graph-auto-encoder-based metrics. Calculating extrinsic metrics requires the ground truth
methods to the single-cell RNA sequencing. while calculating the intrinsic metrics does not require labels. In
the research track, researchers conduct experiments on graph data
Stage IV: apply contrastive learning. More recently, con- with human annotations, thus the extrinsic metrics of clustering
trastive learning has become the mainstream paradigm in the are more common. However, in the industry scenario, labels of
domains of vision [44], [45], [46], [47], [48], [49], [50] and graph nodes are always scarce, therefore the intrinsic metrics are more
[51], [52], [53], [54], [55], [56], [57], and contrastive deep graph practical.
clustering methods are increasingly proposed. Specifically, in the Firstly, we introduce four extrinsic metrics, including accuracy
AGE model, [58] first filters the high-frequency noises in node (ACC) [77], normalized mutual information (NMI) [78], adjust
attributes and then trains the encoder by adaptively discriminating rand index (ARI) [79], and F1 score. These metrics need the
the positive and negative samples. In the same year, MVGRL ground truth labels, therefore the typical pre-process operation
[59] generates an augmented structural view and contrasts node is to map the cluster-ID to the class-ID by the Kuhn-Munkres
embeddings from one view with graph embeddings of another algorithm [80]. Then the metrics can be calculated. Specifically,
view and vice versa. Following up, MCGC [60] and HeCo [61] accuracy (ACC) is calculated as follows:
extend the contrastive paradigm to multi-view clustering and
n
heterogeneous graph learning. X ϕ(yi , map(ŷi ))
ACC = , (4)
i=1
n
Stage V: improve contrastive learning. Although the effec-
tiveness of the contrastive learning paradigm has been verified, where ŷi and yi denote the predicted cluster-ID and the ground
there are still many open technical problems. Specifically, [62] truth label of the i-th sample, respectively. Besides, map(·) de-
proposes GDCL to correct sampling bias in contrastive deep notes the Kuhn-Munkres algorithm [80], which maps the predicted
graph clustering. Besides, to avoid the semantic drift caused by cluster-ID to the class-ID. ϕ(·) is an indicator function as formu-
inappropriate data augmentations, AFGRL [63] is proposed by lated: (
replacing data augmentations with node discovery. Differently, 1 if yi = map(ŷi ),
ϕ(yi , map(ŷi )) = (5)
Liu et al. [64] propose an augmentation-free contrastive deep 0 otherwise.
graph clustering method by designing the parameter-unshared
In addition, the F1-score also evaluates the clustering performance.
encoders. Then, to refine the noisy connections in the graph, [65]
It is a balance of precision and recall, calculated as follows.
propose SUBLIME by generating the sketched graph view with
unsupervised structure learning. Moreover, [66], [67] design the precision · recall
F1 = 2 · , (6)
dual correlation reduction strategy in the DCRN model to alleviate precision + recall
the representation collapse problem in deep graph clustering. To
further enhance the discriminative capability of networks, Liu TP TP
precision = , recall = , (7)
et al. [68] guide the networks to learn the hard sample pairs. TP + FP TP + FN
CCGC [69] propose a new positive and negative sample pairs where TP, FP, and FN denote true positive samples, false positive
construction method. samples, and false negative samples, respectively. Further, the
calculation of Normalized Mutual Information (NMI) is based on
Stage VI: scale to large graphs & apply to more complex mutual information. It is more robust to the unbalanced sample
scenarios. However, previous methods fail to scale to the large distribution. It can be formulated as follows.
graphs, easily leading to the out-of-memory and long running time P P p(ŷ,y)
problem. To this end, Devvrit et al. [70] scale the contrastive deep 2 ŷ y p(ŷ, y) log p(ŷ)p(y)
NMI = − P , (8)
graph clustering to large-scale graphs. Additionally, Dink-Net [71]
P
i p(ŷi ) log(p(ŷi )) + j p(yj ) log(p(yj ))
unified the representation learning and clustering optimization into
an end-to-end framework on large-scale graphs via dilation and where p(ŷ), p(y), and p(ŷ, y) denotes the distribution of the
shrink cluster loss functions. Besides, Shiao et al. [72] utilize the predicted results, distribution of the ground truth, and joint dis-
node clustering method to accelerate graph representation learn- tribution of them, respectively. Differently, another metric named
ing. Besides, Sun et al. [73] rethink the graph clustering problem Adjust Rand Index (ARI) is based on the pairwise similarity
from a geometric perspective and introduce the heterogeneous between predicted labels and ground truth labels. It is formulated
curvature space into deep graph clustering. To adapt deep graph as follows:
RI − expectedRI
clustering methods to both homophily and heterophily graphs, ARI = , (9)
DGCN [74] is proposed by designing the mixed graph filter and
max(RI) − expectedRI
the dual encoders. And Wen et al. [75] propose a framework to where RI denote rand index [79] and expectedRI [79] denote the
match the unpaired multi-view graphs. expected rand index. ARI equals 0 indicates real and modeled
5
Table 2: Taxonomy table of deep graph clustering methods: part I. The criteria contain four aspects: graph type, network architecture,
learning paradigm, and clustering method.
with the sparse auto-encoders [17]. Besides, [19] apply random datasets for temporal deep graph clustering.
surfing in the graph and embed the graph structure with the
stacked denoising auto-encoder. For attribute graphs, the addi- 3.2 Network Architecture
tional attribute information always brings more process operations For the network architecture, the mainstream deep graph clustering
and performance improvement. For instance, [21] integrates the methods can roughly be categorized into three classes: MLP-based
attributes and the graph structure with the graph convolutional method, GNN-based method, and hybrid methods.
operation [2]. Also, [33] transfers the attribute embeddings to
3.2.1 MLP-based Methods
the GCN layer with the delivery operator. Moreover, [34], [111]
demonstrate the effectiveness of the attribute-structure fusion The MLP-based methods utilize the multi-layer perceptrons
mechanism. The heterogeneous graph is more complex since it (MLPs) [123] to exact the informative features in the graphs. For
may contain different types of nodes and edges. To handle these example, GraphEncoder [16] and DNGR [19] encode the graph
graphs, [61] adopts the meta-path technology while [39] treats structure with the auto-encoders. Subsequently, in ProGAN [26]
them as multi-view graphs. Different from the aforementioned and CommunityGAN [27], the authors adopt MLPs to design the
static graphs, dynamic graphs change over time, increasing the generative adversarial networks. Moreover, based on MLPs, AGE
difficulty of clustering. To solve this problem, [120] performs [58] and SCGC [64] design the adaptive encoder and the parameter
graph clustering in an incremental learning fashion. [118] adopt un-shared encoders to embed the smoothed node features into the
the variant of gated recurrent unit (GRU) [121] to capture the latent space. Although the effectiveness of these methods has been
temporal information. Liu et al. [119] present a general frame- demonstrated, it is difficult for MLPs to capture non-euclidean
work for temporal deep graph clustering and [122] collect several structural information in graphs. Thus, GNN-based methods have
been increasingly proposed in recent years.
7
Table 3: Taxonomy table of deep graph clustering methods: part II. The criteria contain four aspects: graph type, network architecture,
learning paradigm, and clustering method.
methods have achieved promising performance. For instance, deep graph clustering methods as follows.
MGAE [21] is proposed to learn the node attribute and graph
structure with the designed graph auto-encoder. In addition, [31] ?
Decoder
Encoder
Encoder
reconstructive methods focus on the intra-data information in the
Graph
Input
graph and adopt the node attributes and graph structure as the
self-supervision signals. The general pipeline of reconstructive
Negative Samples
Positive Samples
Pull Together
deep graph clustering methods is illustrated in Figure 6. The core
Push Away
1
designs of the reconstructive methods are the encoder architecture, Data Augmentation 1
2
2
decoder architecture, and reconstructed objective functions. Re- 3 3
searchers improve reconstructive methods from these perspectives.
Augmented
Encoder
Graph
3.3.2 Adversarial Methods
The adversarial deep graph clustering methods aim to improve
the quality of features by creating a zero-sum game between the
generator and the discriminator. Specifically, the discriminator is
trained to recognize whether learned features are from the real Figure 8: The general pipeline of contrastive deep graph clustering
data distribution, while the generator aims to generate confusing methods. At first, the augmented graph is generated via data aug-
embeddings to cheat the discriminator. In these settings, the mentations, and nodes are embedded into the latent space by the
generation and discrimination operations provide effective self- encoders. Subsequently, the networks are guided to pull together
supervision signals, guiding the network to learn high-quality positive samples and push away negative samples. Finally, the
embeddings. Figure 7 demonstrates the general pipeline of the node embeddings from different views are fused and grouped into
adversarial deep graph clustering methods. The core technologies distinct clusters via the clustering method C .
determining the quality of adversarial methods contain generator
designs, discriminator designs, noise generation, and discrimina- learning paradigms. Besides, researchers [41], [66] also veri-
tive loss functions. Several works aim to improve the performance fied the effectiveness of the combination of reconstructive and
of 2022/11/22
adversarial02:00
methods from these aspects.
adversarial.drawio contrastive learning paradigms. Moreover, in AGC-DRR [115],
the researchers show that an adversarial mechanism is a new
Noise
Fake? option for data augmentation in contrastive learning. How to better
Discriminator
Generator
Real?
3.4 Clustering Method
Input 1 The clustering methods in deep graph clustering aim to separate
2
Graph 1 2 the learned node embeddings into different clusters in a purely
3 3
unsupervised manner. They roughly can be categorized into two
classes: traditional clustering and neural clustering.
Figure 7: The general pipeline of adversarial deep graph clus-
tering. Firstly, the Generator aims to generate high-quality node 3.4.1 Traditional Clustering
embeddings Z from the input graph. Subsequently, the Discrimi- The traditional clustering methods [18], [22], [126], [127], [128],
nator is trained to distinguish the fake information and the learned [129] can be directly performed on the learned node embeddings
features. After the self-supervised training, the clustering method to group them into disjoint clusters in many early deep graph
C separates the learned node embeddings into several clusters. clustering methods [16], [19], [21], [25], [31], [58], [59], [106].
Although they efficiently achieve promising performance, the
traditional clustering methods have two drawbacks as follows:
3.3.3 Contrastive Methods 1) First, the clustering process can not benefit from the strong
The critical idea in contrastive deep graph clustering methods is to representation capability of deep neural networks, thus limiting
improve the discriminativeness of features by pulling together the the discriminative capability of the samples. 2) Second, in these
positive samples while pushing away the negative ones. Thus, con- methods, the node representation learning and the clustering
trastive methods focus on the inter-data information and construct learning cannot be jointly optimized in an end-to-end manner,
the self-supervision signals via meaningful relationships between therefore leading to sub-optimal performance. 3) These methods
samples. The general pipeline of contrastive methods can be found are not easy to adopt batch training and batch inference techniques,
in Figure 8. The main techniques in contrastive methods include limiting the model scalability on large graphs.
graph data augmentations, siamese encoder designs, positive and
negative sample pair construction, negative sampling, counter- 3.4.2 Neural Clustering
active learning loss functions, etc. These aspects are carefully To alleviate the aforementioned issues of traditional clustering
modified to enhance the discriminative capability of contrastive methods, the neural clustering methods are designed to group
learning methods. the samples into different clusters with deep neural networks.
Concretely, in the neural clustering methods, the clustering process
3.3.4 Hybrid Methods and the deep neural network are jointly optimized by the gradient
In about:blank
recent years, some papers have demonstrated the effectiveness descent
1/1
algorithms [130], [131].
of combining different learning paradigms. For example, in ARGA For example, in many two-stage neural clustering methods
[25], Pan et al. adopt both the reconstructive and adversarial [28], [33], [34], [39], [40], [62], [66], [96], [111], a widely-used
2022/11/22 14:14 kl_loss.drawio 2022/11/22 15:55 pseudo_label.drawio
Step1: Step1:
Pre-train in self-supervised manner Pre-train in self-supervised manner
Input Graph
Input Graph
High Confidence
pseudo labels
1 1
Step2: Step2: Train with 3 2
Fine-tune supervision signal
Table 4: The statistical information of eleven attribute graph shown in Figure 3, the main challenges in deep graph clustering
datasets. “# Nodes” denotes the number of nodes, “# Edges” contain five aspects, i.e., graph data quality, stability, scalability,
denotes the number of edges, “# Feature Dims” denotes the discriminative capability, and unknown cluster number. In the
feature dimension number, and “# Classes” denotes the number following five sub-sections, we will analyze these challenges via
of categories. experiments and provide potential solutions and opportunities.
Dataset # Nodes # Edges # Feature Dims # Classes
4.1 Graph Data Quality
BAT 131 81 1,038 4
EAT 399 203 5,994 4 The existing conventional deep graph clustering methods always
UAT 1,190 239 13,599 4 are under the assumption that the input graphs are perfectly
Cora 2,708 5,278 1,433 7
CoraFull 19,793 63,421 8,710 70
correct, namely, the connections between nodes are complete and
CiteSeer 3,327 4,614 3,703 6 correct and the node information is also complete and precise.
ACM 3,025 1,870 13,128 3 However, these assumptions do not always hold, especially in the
DBLP 4,057 334 3,528 4
Amazon-Photo 7,650 119,081 745 8 industrial scenario. The real-world graph data are always of low
ogbn-arxiv 169,343 1,166,243 128 40 quality. The noises come from two aspects, i.e., nodes and edges.
Reddit 232,965 23,213,838 602 41
ogbn-products 2,449,029 61,859,140 100 47 Firstly, for the nodes in graphs, the node attributes may contain
ogbn-papers100M 111,059,956 1,615,685,872 128 172 error information or incomplete information. To be specific, for
the error information, they are caused by error records and easily
spreads in the whole graph during the message-passing process.
LDink-Net = Ldilation + Lshrink , (20) In addition, incomplete information contains two types, i.e., partly
attribute missing and completely attribute missing. The latter one
where the first term Ldilation denotes the cluster dilation loss, and is more difficult to process. Secondly, for the edges in graphs,
the second term Lshrink denotes the cluster shrink loss. Besides, they may also contain the error information and incomplete
Z ∈ Rn×d , C ∈ Rk,d , n, k, b, d denotes the node embeddings, information. Concretely, the error information denotes the error
cluster center embeddings, node number, cluster number, and connections between nodes, while the incomplete information
hidden feature dimension. In Eq. 18, the cluster dilation loss aims denotes the missing edges, which should exist in the correct graph.
to learn the network to push away different cluster centers. In Besides, the conventional graph convolutional network encoders
addition, in Eq. 19, the cluster shrink loss aims to pull the samples [2], [134] are based on the homophily assumption [135], i.e., the
to the cluster centers. Note that the cluster shrink loss pulls batch connected nodes contain similar semantics. However, in the real-
samples to the cluster centers and easily scales to large graphs. world scenario, the connected nodes may not have similar features.
Besides, each sample will be pulled to all cluster centers rather For example, in the heterophily graphs [136] such as fraudulent
than the corresponding nearest cluster centers. The authors claim networks, the hackers and regular users will construct dense
this design alleviates the confirmation bias problem [133]. The connections, but they do not share similar underlying semantics.
time complexity and space complexity of calculating LDink-Net is To verify this challenge in deep graph clustering, we conduct
O(bkd + k 2 d + bd) and O(bk + bd + kd), respectively. experiments on one of the representative state-of-the-art methods,
Although verified effective, the two-stage neural clustering HSAN [68], on five datasets, including Cora, CiteSeer, BAT, EAT,
methods heavily depend on the excellent initialization of cluster and UAT. The detailed information of the datasets can be found in
centers. And the precise cluster center initialization relies on the Table 4. The experimental results are demonstrated in Table 5 and
promising network pre-training and the auxiliary of traditional Table 6.
clustering methods like k -Means, leading to high training and In Table 5, we evaluate the clustering performance of HSAN
deployment costs. To overcome this shortcoming, many one- [68] with missing data. The missing information contains node
stage methods [64], [97], [115] have been gradually proposed attributes and edges. And the drop rates are set to 10%, 30%, 50%,
recently. For example, [64] proposes a new method termed SCGC 70%, and 90%, respectively. From the experimental results, three
with Laplacian smoothing and neighborhood-orient contrastive conclusions are drawn. Firstly, when the information on graphs
learning. Besides, in AGC-DRR [115], the clustering sub-network is missing, the deep graph clustering method can not achieve
is developed to directly predict the probabilities of cluster-ID promising performance. Secondly, the clustering performance
for each sample. Moreover, rather than dealing with clustering drops when the drop rate increases. Thirdly, the node attributes
as a downstream task, [97] proposes a unified framework termed are more critical to the clustering performance on some citation
GCC by minimizing the difference between node embeddings and graphs, such as the CiteSeer dataset, while the edges are more
reconstructed cluster embeddings. The one-stage neural clustering critical to the clustering performance on some airport activity
will be a promising future direction. In the next section, we graphs, such as the BAT dataset.
analyze and summarize the technical challenges and opportunities In addition, in Table 6, the performance of HSAN [68] with
in the deep graph clustering field in detail. noisy information is tested. Similarly, the noised information
contains the noised node attribute and noised edges. Concretely,
for the node attributes, we add the Gaussian noises to them, and
4 C HALLENGE & O PPORTUNITY the standard deviations are set to 0.01, 0.1, 1, and 10, respectively.
In recent years, we have witnessed the fast growth of deep Besides, for the edges in the graph, we add the Gaussian noises
graph clustering. More and more methods have been proposed to the adjacency matrix, and the standard deviations are set to
and achieved promising performance. However, most methods are 0.01, 0.1, 1, and 10, respectively. We have two conclusions from
under some perfect assumption, and there are still many challenges the experimental results. Firstly, the noises in attributes and edges
in the field of deep graph clustering. From this motivation, this limit the clustering performance. Secondly, when the noise rate
section aims to summarize the main technical challenges. As increases, the performance drops significantly.
11
Table 5: Graph data quality testing experiments. Clustering performance of HSAN [68] on five datasets with missing information
including dropped node attributes and dropped edges. Experimental results are the mean value of ten runs.
Attribute Drop Rate Edge Drop Rate
Origin
10% 30% 50% 70% 90% 10% 30% 50% 70% 90%
ACC 77.73 75.22 74.00 68.07 63.74 46.16 77.13 75.00 71.53 62.96 48.81
NMI 59.45 57.93 54.73 48.26 40.62 21.21 59.55 56.74 51.69 42.54 29.54
Cora
ARI 57.62 54.76 51.72 42.05 36.01 16.42 58.03 53.81 47.31 38.10 18.98
F1 76.16 73.70 72.78 67.72 62.91 45.39 74.97 72.09 68.48 59.34 47.12
ACC 71.36 69.64 64.52 62.26 51.21 36.13 70.83 67.66 62.48 53.64 55.06
NMI 45.03 42.01 35.56 32.33 19.83 8.66 44.53 42.12 35.30 26.80 27.50
CiteSeer
ARI 46.81 43.94 36.19 32.23 19.74 8.02 45.95 38.75 31.38 22.38 23.50
F1 63.63 62.25 56.98 54.09 45.08 28.75 63.44 59.53 56.06 49.59 48.84
ACC 77.86 77.61 75.32 74.30 76.08 70.74 65.90 56.74 49.87 48.35 51.15
NMI 53.00 53.34 50.19 49.40 50.96 45.29 41.70 30.36 22.33 24.82 22.75
BAT
ARI 51.16 51.26 47.42 46.25 48.61 42.26 36.32 23.94 15.40 15.42 17.09
F1 77.80 77.48 75.15 74.02 75.85 69.40 62.71 53.10 45.43 43.00 50.50
ACC 57.73 57.73 58.15 58.06 57.39 55.97 55.22 52.88 47.70 44.78 45.36
NMI 34.12 34.35 34.86 33.83 33.00 31.73 33.10 29.38 22.26 16.79 17.88
EAT
ARI 27.22 27.37 28.17 27.33 26.99 25.58 26.16 22.89 15.57 11.40 13.69
F1 58.09 57.98 58.31 58.21 57.20 55.85 51.02 47.87 45.79 44.43 43.42
ACC 56.16 56.89 55.18 55.29 54.45 55.77 52.55 46.69 42.86 42.94 47.90
NMI 26.89 26.60 25.51 23.55 23.77 24.68 20.30 14.58 13.57 15.04 19.06
UAT
ARI 25.21 24.49 23.11 23.23 22.77 24.79 19.40 13.82 09.63 11.98 16.28
F1 55.99 54.33 51.46 53.86 51.91 55.05 49.60 43.53 40.65 38.13 44.13
Table 6: Graph data quality testing experiments. Clustering performance of HSAN [68] on five datasets with noisy information including
noised node attributes and noised edges. Experimental results are the mean value of ten runs.
Next, we discuss the potential solutions. Firstly, for the in- clustering models is relatively weak. For example, for the classical
complete node attributes or edges, the imputation networks [137], clustering method such as k -Means [18], its promising perfor-
[138] can help alleviate the information missing issue. Besides, mance highly relies on the quality of the initial cluster centers.
for the error information, the denoising technologies [139], [140], Similarly, the performance of deep clustering techniques [29], [30]
[141] might effectively remove the noises in the node attributes is also sensitive to the initialization of the neural networks and
and edges. Additionally, to solve the homophily and heterophily trainable cluster centers. Next, we systematically analyze the ran-
problems, researchers propose various methods [74], [136]. domness in the deep graph clustering methods. It mainly consists
of two parts as follows. Firstly, one of the key factors in achieving
promising clustering performance is the excellent representation
4.2 Stability
capability of the learned node embeddings. Thus, the initialization
The stability of deep graph clustering algorithms is essential, es- and training processes of the embedding networks influence the
pecially in sensitive domains such as financial risk control, social stability of deep graph clustering methods. Secondly, a clustering
network anomaly detection, etc. Different from the supervised method with strong discriminative capability is another critical
methods or the semi-supervised methods, deep graph clustering factor. Therefore, initialization and optimization processes of the
methods group the nodes into disjoint clusters in a purely unsu- neural cluster centers also easily affect the stability of deep graph
pervised manner, i.e., without any human annotations. Therefore, clustering methods.
without the guidance of ground truth, the stability of deep graph
12
Table 7: Stability testing of deep graph clustering methods. Average clustering performance of eleven methods on six datasets. The
experimental results are obtained with ten runs and presented with standard deviation.
Dataset Metric k-Means AE DEC IDEC GAE DAEGC ARGA SDCN MVGRL DFCN DCRN
ACC 0.65 0.35 0.56 0.62 1.22 0.48 0.59 1.81 1.02 0.80 0.25
NMI 0.38 0.16 0.28 0.50 0.91 0.45 0.92 1.34 0.63 1.00 0.44
DBLP
ARI 0.39 0.43 0.39 0.60 1.40 0.52 0.91 2.01 0.50 1.50 0.46
F1 0.27 0.36 0.51 0.56 2.23 0.67 0.66 1.51 1.51 0.80 0.26
ACC 3.17 0.13 0.20 1.42 0.80 1.39 0.49 0.31 0.36 0.20 0.18
NMI 3.22 0.08 0.30 2.40 0.65 0.86 0.71 0.32 0.40 0.20 0.35
CiteSeer
ARI 3.02 0.14 0.36 2.65 1.18 1.24 0.70 0.43 0.73 0.30 0.30
F1 3.53 0.11 0.17 1.39 0.82 1.32 0.31 0.24 0.39 0.20 0.21
ACC 0.71 0.08 0.76 0.52 1.44 2.83 0.36 0.18 0.76 0.20 0.20
NMI 0.46 0.16 1.51 1.16 1.92 4.15 0.82 0.25 1.40 0.40 0.61
ACM
ARI 0.69 0.16 1.87 1.50 3.10 3.89 0.86 0.40 1.76 0.40 0.52
F1 0.74 0.08 0.74 0.48 1.33 2.79 0.35 0.19 0.72 0.20 0.20
ACC 0.76 0.08 0.08 0.08 2.48 0.01 2.30 0.81 1.79 0.80 0.13
NMI 1.33 0.30 0.05 0.08 2.79 0.03 2.76 0.83 1.31 1.00 0.24
Amazon-Photo
ARI 0.44 0.47 0.04 0.07 4.57 0.02 4.41 1.23 0.47 0.84 0.20
F1 0.51 0.20 0.12 0.11 1.76 0.02 1.95 1.49 2.39 0.31 0.12
ACC 0.01 0.31 0.09 0.34 0.81 0.03 0.12 1.30 0.52 0.07 0.07
NMI 0.02 0.57 0.14 0.29 3.54 0.03 0.17 2.04 1.45 0.13 0.08
PubMed
ARI 0.01 0.67 0.13 0.39 1.39 0.04 0.17 2.07 1.06 0.11 0.12
F1 0.01 0.29 0.10 0.32 0.85 0.02 0.13 1.21 0.36 0.07 0.08
ACC 1.10 0.19 0.45 0.31 0.81 1.00 0.43 0.40 2.95 0.81 0.60
NMI 0.84 0.25 0.24 0.28 0.75 0.73 0.25 0.39 3.95 0.41 0.35
CoraFULL
ARI 0.57 0.27 0.29 0.22 0.86 0.47 0.24 0.27 2.63 0.48 0.49
F1 1.09 0.30 0.58 0.41 0.75 1.33 0.41 0.43 2.87 0.87 0.76
We conduct experiments to test the stability of existing Secondly, the clustering algorithms need to group a large number
deep graph clustering methods on six datasets, including DBLP, of nodes into separate clusters.
CiteSeer, ACM, Amazon-Photo, PubMed, and CoraFULL. The For the first part, batch training [145] will be a good solution.
compared methods include DCRN [66], DFCN [34], MVGRL Concretely, researchers can partition the large graph into a mini-
[59], DAEGC [28], , SDCN [33], ARGA [25], GAE [4], DEC batch graph via sub-graph extraction technology like random walk
[29], IDEC [30], AE [142], k -Means [18]. All experimental [146] and then train the networks on the sub-graphs. Here, unlike
results are obtained with ten runs. And the results are reported images or texts, one of the essential characteristics of graph data
with standard deviation. The experimental results indicate that is the relational connections between nodes. Thus, keeping the
the stability of recent methods seems good. However, there are original connection information after the graph partition is vital
various problems with their implementation. Firstly, the classical for the model’s performance.
method k -Means initializes the cluster centers various times to find For the second part, given the learned node embeddings
excellent initialization. Secondly, most of the existing methods Z ∈ Rn×d , the classical k -Means clustering algorithm will take
such as DEC [29], IDEC [30], DAEGC [28], ARGA [25], SDCN O(tknd) time complexity and O(nd + kd) space complexity.
[33], MVGRL [59], DFCN [34], and DCRN [66] rely on the Here, n, k , and d are the node number, cluster number, and feature
well pre-trained node embeddings and cluster center embeddings. dimension number, respectively. Besides, t is the iteration number
And the manual trial processes of network pre-training and center of k -Means. On the large graph, the node number n is very large,
initialization seem to increase the stability of deep graph clustering such as 111M node number on ogbn-paper100M dataset [132],
methods. However, in the real-world scenario, implementing these easily leading to out-of-memory on GPU or the long calculation
procedures is costly. Thus, we consider that the stability of the time on CPU.
existing deep graph clustering methods is overestimated. It will be
To demonstrate this challenge in deep graph clustering meth-
a promising direction to improve the robustness and stability of
ods, we conduct experiments and theoretical analyses including
deep graph clustering methods.
comparison experiments, running cost evaluation, and complexity
In the existing studies, researchers conduct ten runs with
analyses. As shown in Table 8. Concretely, fifteen important
different random seeds to alleviate the randomness influence on
baselines are tested on four large graph datasets, including ogbn-
the experimental results. We consider that some optimization
arXiv, ogbn-products, Reddit, and ogbn-papers100M. The detailed
strategies [143], [144] might enhance the stability of deep graph
statistics of datasets are summarized in Table 4. The baselines
clustering.
contain k -Means [18], DEC [29], DCN [147], node2vec [148],
DGI [51], AGE [58], MinCutPool [35], MVGRL [59], BGRL
[55], GRACE [52], ProGCL [149], AGC-DRR [115], DCRN [66],
4.3 Scalability
S3 GC [70], and Dink-Net [71]. From these experimental results,
In this section, we discuss the scalability of deep graph clustering three conclusions are obtained as follows. Firstly, the classical
methods. Although achieving promising performance, most pre- method k -Means can scale to large graphs with million of nodes
vious deep graph clustering methods can not scale to large-scale but can not achieve promising performance since the represen-
graph datasets, such as ogbn-papers100M [132]. The heavy time tation learning capability is limited. Secondly, most of the deep
and memory costs mainly come from two parts, as follows. Firstly, graph clustering methods like MVGRL [59], AGE [58], GRACE
the graph encoders need to be trained on large-scale graphs. [52], and DCRN [66] fail to run on large graphs. The reasons
13
Table 8: Scalability testing experiments on four large graphs. “OOM” indicates that method raises the out-of-memory failure.
Dataset Metric k-Means DEC DCN node2vec DGI AGE MVGRL BGRL GRACE ProGCL AGC-DRR DCRN S3 GC Dink-Net
ACC 18.11 21.25 19.91 29.00 31.40 22.70 29.86 35.00 43.68
NMI 22.13 25.14 23.81 40.60 41.20 32.10 37.51 46.30 43.73
ogbn-arXiv OOM OOM OOM OOM OOM
ARI 7.43 10.28 8.25 19.00 22.30 13.00 25.74 27.00 35.22
F1 12.94 15.57 13.06 22.00 23.00 16.60 21.79 23.00 26.92
ACC 18.11 23.79 24.50 35.70 32.00 35.21 40.20 41.09
ogbn- NMI 22.13 24.47 21.92 48.90 46.70 46.59 53.60 50.78
OOM OOM OOM OOM OOM OOM
products ARI 7.43 9.05 10.96 17.00 17.40 19.87 23.00 21.08
F1 12.94 13.54 13.95 24.70 19.20 21.55 25.00 25.15
ACC 8.90 70.90 22.40 65.41 73.60 76.03
NMI 11.40 79.20 30.60 70.48 80.70 78.91
Reddit OOM OOM OOM OOM OOM OOM OOM OOM
ARI 2.90 64.00 17.00 63.42 74.50 71.34
F1 6.80 55.10 18.30 51.45 56.00 67.95
ACC 14.60 17.50 15.10 17.30 26.67
ogbn- NMI 37.33 38.00 41.60 45.30 54.92
OOM OOM OOM OOM OOM OOM OOM OOM OOM
papers100M ARI 7.54 11.20 9.60 11.00 18.01
F1 10.45 11.10 11.10 11.80 19.48
Table 9: Efficiency analyses of eight deep graph clustering methods. The left two columns denote the time and space complexity
analyses. The right two columns denote the GPU and time costs. Experimental results obtained on the Cora dataset. n, b, k , l, g , d′ ,
and d denote node number, batch size, cluster number, edge number, average degree of graph, attribute dimension, and latent feature
dimension.
Method Time Complexity (per iteration) Space Complexity GPU Memory Cost (MB) Time Cost (s)
DEC O(nkd) O(nk + nd + kd) 1294 14.59
node2vec O(bd) O(nd) - 111.03
DGI O(ld′ + nd2 ) O(l + nd + d2 ) 3798 19.03
MVGRL O(n2 d + nd2 ) O(n2 + nd + d2 ) 9466 168.20
GRACE O(n2 d + ld + d2 ) O(l + nd) 1292 44.77
BGRL O(ld + nd2 ) O(l + nd + d2 ) 1258 44.18
S3 GC O(ngd2 ) O(nd + bgd + d2 ) 1474 508.21
Dink-Net O(nkd + k2 d + bd) O(bk + bd + kd) 1248 35.09
are that they either process the graph diffusion matrix or do not learning. [66] observe the representation collapse problem, i.e.,
adopt batch training techniques. Thirdly, the scalable deep graph the encoder tends to embed all samples into the same cluster in
clustering methods S3 GC [70] and Dink-Net [71] can run on large deep graph clustering methods. To demonstrate this problem, we
graphs like ogbn-papers100M [132] with 111 million nodes and conduct extensive experiments, including three types: performance
achieve promising performance. S3 GC [70] conducts contrastive comparison experiments, similarity visualization experiments, and
learning on the graph and performs k -Means on the learned node embedding visualization experiments.
embeddings. In addition, Dink-Net [71] unifies the representation At first, as shown in Table 10, we conduct performance
learning and clustering optimization via cluster dilation and cluster comparison experiments on six datasets, including DBLP, Cite-
shrink loss functions. Besides, in Table 9, we analyze the time Seer, ACM, Amazon-Photo, PubMed, and CoraFull datasets. The
complexity and space complexity, and test the GPU memory costs detailed information of these datasets is listed in Table 4. We com-
and time costs of six deep graph clustering methods, including pare thirteen state-of-the-art methods including DEC [29], GAE
DEC [29], node2vec [148], DGI [51], MVGRL [59], BGRL [55], [4], DAEGC [28], ARGA [25], SDCN [33], DFCN [34], MVGRL
S3 GC [70], and Dink-Net [71]. [59], MCGC [60], SCAGC [104], SCGC [64], AGC-DDR [115],
Next, we discuss the potential techniques to improve the DCRN [66], and IDCRN [67]. From these experimental results,
scalability of deep graph clustering methods. For the first part, i.e., we have three observations as follows. Firstly, the performance
encoder training on the large-scale graph, the possible techniques of deep clustering methods such as DEC [29] is limited since
include the better sub-graph extraction method, more efficient it omits the graph structure of graph data. Secondly, various deep
batch training strategy, etc. For the second part, i.e., node clus- reconstructive graph clustering methods such as GAE [4], DAEGC
tering on the large-scale graph, the potential solutions contain the [28], SDCN [33], DFCN [34] significantly improve the clustering
mini-batch k -Means [150], finding cluster centers via the neural performance since they reconstruct the graph information and
network [71], calculating the clustering assignments in a mini- improve the discriminative capability of networks. Thirdly, the
batch manner [71], and bi-part graph clustering [151], [152], etc. recent contrastive methods like MVGRL [59], SCGC [64], and
DCRN [66] further enhance the sample discriminative capability
4.4 Discriminative Capability via pulling together positive sample pairs and pushing away the
One key factor of promising clustering performance is to learn negative sample pairs, achieving the most promising performance.
discriminative node embeddings, which help the clustering meth- In addition, to intuitively demonstrate this challenge, we con-
ods better group nodes into different clusters. Therefore, the duct the t-SNE visualization on the learned node embeddings. As
discriminative capability of the network is vital in self-supervised shown in Figure 11, Three observations can be found as follows.
14
Table 10: Discriminative capability testing experiments. Clustering performance of thirteen methods on six datasets
Dataset Metric DEC GAE DAEGC ARGA SDCN DFCN MVGRL MCGC SCAGC SCGC AGC-DDR DCRN IDCRN
ACC 58.16 61.21 62.05 64.83 68.05 76.00 42.73 58.92 79.42 67.25 80.41 79.66 82.08
NMI 29.51 30.80 32.49 29.42 39.50 43.70 15.41 33.69 49.05 38.64 49.77 48.95 52.70
DBLP
ARI 23.92 22.02 21.03 27.99 39.15 47.00 8.22 25.97 54.04 35.95 55.39 53.60 58.81
F1 59.38 61.41 61.75 64.97 67.71 75.70 40.52 50.39 78.88 67.06 79.90 79.28 81.47
ACC 55.89 61.35 64.54 61.07 65.96 69.50 68.66 64.76 61.16 71.02 68.32 70.86 71.40
NMI 28.34 34.63 36.41 34.40 38.71 43.90 43.66 39.11 32.83 45.25 43.28 45.86 46.77
CiteSeer
ARI 28.12 33.55 37.78 34.32 40.17 45.50 44.27 37.54 31.17 46.29 45.34 47.64 48.67
F1 52.62 57.36 62.20 58.23 63.62 64.30 63.71 59.64 56.82 62.24 64.82 65.83 66.27
ACC 84.33 84.52 86.94 86.29 90.45 90.90 86.73 91.64 91.83 89.80 92.55 91.93 92.58
NMI 54.54 55.38 56.18 56.21 68.31 69.40 60.87 70.71 71.28 66.60 72.89 71.56 73.17
ACM
ARI 60.64 59.46 59.35 63.37 73.91 74.90 65.07 76.63 77.29 72.37 79.08 77.56 79.18
F1 84.51 84.65 87.07 86.31 90.42 90.80 86.85 91.70 91.84 89.74 92.55 91.94 92.60
ACC 47.22 71.57 76.44 69.28 53.44 76.88 45.19 OOM 75.25 77.48 78.11 79.94 80.17
NMI 37.35 62.13 65.57 58.36 44.85 69.21 36.89 OOM 67.18 67.67 72.21 73.70 74.32
Amazon-Photo
ARI 18.59 48.82 59.39 44.18 31.21 58.98 18.79 OOM 56.86 58.48 61.15 63.69 64.10
F1 46.71 68.08 69.97 64.30 50.66 71.58 39.65 OOM 72.77 72.22 72.72 73.82 74.01
ACC 60.14 62.09 68.73 65.26 64.20 68.89 67.01 60.97 OOM 63.14 - 69.87 70.02
NMI 22.44 23.84 28.26 24.80 22.87 31.43 31.59 33.39 OOM 27.77 - 32.20 33.29
PubMed
ARI 19.55 20.62 29.84 24.35 22.30 30.64 29.42 29.25 OOM 24.73 - 31.41 32.67
F1 61.49 61.37 68.23 65.69 65.01 68.10 67.07 59.84 OOM 62.27 - 68.94 69.19
ACC 31.92 29.60 34.35 22.07 26.67 37.51 31.52 29.08 OOM OOM - 38.80 39.45
NMI 41.67 45.82 49.16 41.28 37.38 51.30 48.99 36.86 OOM OOM - 51.91 52.83
CoraFULL
ARI 16.98 17.84 22.60 12.38 13.63 24.46 19.11 13.15 OOM OOM - 25.25 25.97
F1 27.71 25.95 26.96 18.85 22.14 31.22 26.51 22.90 OOM OOM - 31.68 32.58
(a) Raw Data (b) DEC (c) GAE (d) ARGA (e) DFCN (f) IDCRN
Figure 11: Discriminative capability testing experiments. t-SNE visualization of node embeddings. The compared methods include
DEC [29], GAE [4], ARGA [25], DFCN [34], and IDCRN [67]. The first and second row is the result on ACM and DBLP dataset.
Recently, we have witnessed the fast development of self- To solve this problem, one potential solution is to perform
supervised learning on graphs [153]. The reconstructive deep clustering based on density [154]. Concretely, the clustering net-
graph clustering methods [4], [28], [33], [34], [39] reconstruct work should be designed to assign the high-density data area as
the graph information such as node attribute and graph structure. the cluster centers and regard the low-density data area as the
Besides, the adversarial methods [25], [106], [115] generate the decision borders. Besides, another potential solution technology
adversarial samples and discriminate them from the real samples. is deep reinforcement learning [86], [155]. To be specific, in
In addition, the contrastive methods [66], [71], [104], [115] pull an unsupervised scenario, recognizing the cluster number in the
together the positive sample pairs while pushing away the negative graph data can be modeled as the Markov decision process and be
sample pairs. Previous researchers designed different promising handled with deep reinforcement neural networks. More cluster
pre-text tasks to enhance the sample discriminative capability number determination methods refer to [156]. In the next section,
of networks. In the future, to further enhance the discriminative we will introduce the applications of deep graph clustering in four
capability, the hard sample mining strategies [68], [149] will be domains.
an exciting future direction.
5 A PPLICATION & O PEN R ESOURCE
4.5 Unknown Cluster Number 5.1 Application
Most of the existing deep graph clustering methods consider the In recent years, we have witnessed the fast growth of deep graph
cluster number as a given value, which is usually the same as the clustering. Thanks to the researchers in this field, promising deep
classes of the ground truth. Benefiting from this setting, recent graph clustering methods are increasingly proposed. Benefiting
deep graph clustering methods can achieve promising perfor- from the strong data partitioning capability, deep graph clustering
mance. The remarkable success of recent deep graph clustering has been applied to various real-world application domains, such
algorithms relies on the pre-defined cluster number. However, as natural language processing, computer vision, social network
in most real-world scenarios, the cluster number is an unknown analyses, recommendation systems, bioinformatics, medical sci-
value. For example, in Internet social networks, the users can be ence, etc. The applications are demonstrated in Figure 14.
grouped into several clusters, but we can not know the specific Next, we introduce the applications in detail. In the computer
number of user groups. In addition, the number of user groups vision domain, deep graph clustering methods are applied to face
is the most important in the user grouping algorithm. Therefore, analysis [157], [158], co-saliency detection [159] and video anal-
we think the clustering performance of the existing deep graph yses [160]. Besides, document mining [161], speech separation
clustering methods is overestimated. [162], and large language models [163] are essential applications
of deep graph clustering in the natural language processing area.
In addition, deep graph clustering is crucial for social network
analyses. It can be used to conduct community detection [164],
[165], [166], [167] and anomaly detection [168], [169]. Similarly,
deep graph clustering demonstrates the high application values
in the recommendation systems. To be specific, it can help user
grouping recommendations and user intent extraction [170]. Apart
from social data mining, deep graph clustering is also vital to
bioinformatics and medical science. Concretely, the applications
in the bioinformatics field include molecular mining [171], [172],
(a) DBLP (b) CiteSeer
metagenomic binning [173], single-cell RNA sequencing [43],
Figure 13: Unknown cluster number experiments. Clustering per- etc. Also, in the medical science domain, deep graph clustering
formance of DCRN [66] with different cluster numbers K ∈ methods are adopted in disease analyses [174], medical big data
[2, 10] on Cora and CiteSeer datasets. [175], and medical image [176]. In the future, we hope the
researchers will further solve the challenges and apply deep graph
In order to verify our suspect, we conduct experiments on clustering methods to a broader range and more significant fields.
two datasets, including DBLP and CiteSeer. Concretely, we adopt
DCRN [66] as the baseline and evaluate its clustering performance 5.2 Open Resource
with NMI and ARI metrics using different cluster numbers ranging In order to construct an open, active, and collaborative research
from 2 to 10. The experimental results are demonstrated in Figure & engineering atmosphere in the deep graph clustering field, we
13. We carefully analyze the results the draw two conclusions as make efforts to the open-source projects of deep graph clustering.
follows. Firstly, the baseline DCRN achieves promising and best Concretely, we make a deep graph clustering collection and build
performance when the cluster number equals the class number of a unified framework of deep graph clustering methods. We hope
ground truth. Here, the sample class number of the DBLP dataset these open resources will help the researchers quickly startup,
is 4, and the sample class number of CiteSeer is 6. Secondly, other solve critical problems, and apply deep graph clustering methods
cluster numbers will lead to performance degeneration. When to a boarder range of applications. Next, we introduce these two
the cluster number reduces, many clusters will collapse into one GitHub repositories in detail.
cluster. Besides, when the cluster number increases, the large
cluster will be split into many small clusters. Therefore, the cluster 5.2.1 Awesome Deep Graph Clustering
number is crucial for the clustering performance, and designing Awesome Deep Graph Clustering (ADGC) is a collection
the deep graph clustering methods without the pre-defined cluster of state-of-the-art, novel deep graph clustering methods, in-
number will be a challenging and meaningful direction. cluding papers, codes, and datasets. In addition, this project
16
Recommendation
System Computer Co-saliency
Speech Natural Language
Separation Vision Detection
Processing
Video
Large Langugage Applications of Anlyses
Models Deep Graph Clustering
Anomaly Molecular
Detection Mining
Single-cell
Disease Medical Medical RNA Sequencing
Analyses Big Data Image
Figure 14: Applications of deep graph clustering in six domains: natural language processing, computer vision, social network analyses,
recommendation systems, bioinformatics, and medical science.
open-sources some common data processing, clustering, met- open resources on the GitHub platform. Among them, Awesome
ric calculation, and visualization functions. The project ad- Deep Graph Clustering is a comprehensive collection of deep
dress can be found at GitHub (https://github.com/yueliu1999/ graph clustering, including papers, codes, and datasets. Besides,
Awesome-Deep-Graph-Clustering). Up to this paper submission A-Unified-Framework-for-Deep-Attribute-Graph-Clustering is a
time, we have collected about 130 papers and codes and about unified framework of deep graph clustering. We hope this work
20 datasets. Besides, this GitHub repository has gained about becomes a quick guide for the researchers and motivates them to
500 stars and 100 forks. We welcome any issues, pull requests, solve significant problems in deep graph clustering.
contributions, discussions, etc.
[8] K. Liang, J. Tan, D. Zeng, Y. Huang, X. Huang, and G. Tan, “Abslearn: a [36] A. Tsitsulin, J. Palowitch, B. Perozzi, and E. Müller, “Graph clustering
gnn-based framework for aliasing and buffer-size information retrieval,” with graph neural networks,” Journal of Machine Learning Research,
Pattern Analysis and Applications, 2023. 2023.
[9] S. Zhou, H. Xu, Z. Zheng, J. Chen, J. Bu, J. Wu, X. Wang, W. Zhu, [37] Z. Ying, J. You, C. Morris, X. Ren, W. Hamilton, and J. Leskovec,
M. Ester et al., “A comprehensive survey on deep clustering: Taxonomy, “Hierarchical graph representation learning with differentiable pooling,”
challenges, and future directions,” arXiv preprint arXiv:2206.07579, Proc. of NeurIPS, 2018.
2022. [38] J. Lee, I. Lee, and J. Kang, “Self-attention graph pooling,” in Proc. of
[10] Y. Ren, J. Pu, Z. Yang, J. Xu, G. Li, X. Pu, P. S. Yu, and L. He, “Deep ICML, 2019.
clustering: A comprehensive survey,” arXiv preprint arXiv:2210.04142, [39] S. Fan, X. Wang, C. Shi, E. Lu, K. Lin, and B. Wang, “One2multi graph
2022. autoencoder for multi-view graph clustering,” in Proc. of WWW, 2020.
[11] E. Aljalbout, V. Golkov, Y. Siddiqui, M. Strobel, and D. Cremers, [40] J. Cheng, Q. Wang, Z. Tao, D. Xie, and Q. Gao, “Multi-view attribute
“Clustering with deep learning: Taxonomy and new methods,” arXiv graph convolution networks for clustering,” in Proceedings of the
preprint arXiv:1801.07648, 2018. Twenty-Ninth International Conference on International Joint Confer-
[12] E. Min, X. Guo, Q. Liu, G. Zhang, J. Cui, and J. Long, “A survey ences on Artificial Intelligence, 2021.
of clustering with deep learning: From the perspective of network [41] W. Xia, Q. Wang, Q. Gao, X. Zhang, and X. Gao, “Self-supervised
architecture,” IEEE Access, 2018. graph convolutional network for multi-view clustering,” IEEE Trans.
[13] S. Wang, J. Yang, J. Yao, Y. Bai, and W. Zhu, “An overview of advanced Multim., 2022.
deep graph node clustering,” IEEE Transactions on Computational [42] N. Mrabah, M. Bouguessa, M. F. Touati, and R. Ksantini, “Rethink-
Social Systems, 2023. ing graph auto-encoder models for attributed graph clustering,” IEEE
[14] S. Xing, X. Shan, L. Fanzhen, W. Jia, Y. Jian, Z. Chuan, H. Wenbin, Transactions on Knowledge and Data Engineering, 2022.
P. Cecile, N. Surya, J. Di et al., “A comprehensive survey on community [43] Z. Yu, Y. Lu, Y. Wang, F. Tang, K.-C. Wong, and X. Li, “Zinb-based
detection with deep learning,” IEEE Trans. Neural Netw. Learn. Syst, graph embedding autoencoder for single-cell rna-seq interpretations,” in
2022. Proc. of AAAI, 2022.
[15] F. Liu, S. Xue, J. Wu, C. Zhou, W. Hu, C. Paris, S. Nepal, J. Yang, and [44] R. D. Hjelm, A. Fedorov, S. Lavoie-Marchildon, K. Grewal, P. Bach-
P. S. Yu, “Deep learning for community detection: progress, challenges man, A. Trischler, and Y. Bengio, “Learning deep representations by
and opportunities,” in Proceedings of the Twenty-Ninth International mutual information estimation and maximization,” in Proc. of ICLR,
Conference on International Joint Conferences on Artificial Intelli- 2018.
gence, 2021. [45] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for
[16] F. Tian, B. Gao, Q. Cui, E. Chen, and T.-Y. Liu, “Learning deep unsupervised visual representation learning,” in Proc. of CVPR, 2020.
representations for graph clustering,” in Proc. of AAAI, 2014. [46] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework
[17] A. Ng et al., “Sparse autoencoder,” CS294A Lecture notes, 2011. for contrastive learning of visual representations,” in Proc. of ICML,
[18] J. A. Hartigan and M. A. Wong, “Algorithm as 136: A k-means 2020.
clustering algorithm,” Journal of the royal statistical society. series c [47] J.-B. Grill, F. Strub, F. Altché, C. Tallec, P. H. Richemond,
(applied statistics), 1979. E. Buchatskaya, C. Doersch, B. A. Pires, Z. D. Guo, M. G. Azar
[19] S. Cao, W. Lu, and Q. Xu, “Deep neural networks for learning graph et al., “Bootstrap your own latent: A new approach to self-supervised
representations,” in Proc. of AAAI, 2016. learning,” arXiv preprint arXiv:2006.07733, 2020.
[20] S. Chang, W. Han, J. Tang, G.-J. Qi, C. C. Aggarwal, and T. S. Huang, [48] J. Zbontar, L. Jing, I. Misra, Y. LeCun, and S. Deny, “Barlow twins:
“Heterogeneous network embedding via deep architectures,” in Proc. of Self-supervised learning via redundancy reduction,” arXiv preprint
KDD, 2015. arXiv:2103.03230, 2021.
[49] X. Yang, X. Hu, S. Zhou, X. Liu, and E. Zhu, “Interpolation-based
[21] C. Wang, S. Pan, G. Long, X. Zhu, and J. Jiang, “Mgae: Marginalized
contrastive learning for few-label semi-supervised learning,” IEEE
graph autoencoder for graph clustering,” in Proceedings of the 2017
Transactions on Neural Networks and Learning Systems, 2022.
ACM on Conference on Information and Knowledge Management,
[50] X. Yang, J. Jin, S. Wang, K. Liang, Y. Liu, Y. Wen, S. Liu, S. Zhou,
2017.
X. Liu, and E. Zhu, “Dealmvc: Dual contrastive calibration for multi-
[22] U. Von Luxburg, “A tutorial on spectral clustering,” Statistics and
view clustering,” arXiv preprint arXiv:2308.09000, 2023.
computing, 2007.
[51] P. Velickovic, W. Fedus, W. L. Hamilton, P. Liò, Y. Bengio, and R. D.
[23] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
Hjelm, “Deep graph infomax.” ICLR (Poster), 2019.
S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,”
[52] Y. Zhu, Y. Xu, F. Yu, Q. Liu, S. Wu, and L. Wang, “Deep Graph
Proc. of NeurIPS, 2014.
Contrastive Representation Learning,” in ICML Workshop on Graph
[24] S. Pan, R. Hu, G. Long, J. Jiang, L. Yao, and C. Zhang, “Adversarially Representation Learning and Beyond, 2020.
regularized graph autoencoder for graph embedding,” in Proc. of IJCAI, [53] J. Xia, L. Wu, J. Chen, B. Hu, and S. Z. Li, “Simgrace: A simple
2018. framework for graph contrastive learning without data augmentation,”
[25] S. Pan, R. Hu, S.-f. Fung, G. Long, J. Jiang, and C. Zhang, “Learning arXiv preprint arXiv:2202.03104, 2022.
graph embedding with adversarial training methods,” IEEE transactions [54] Y. You, T. Chen, Y. Sui, T. Chen, Z. Wang, and Y. Shen, “Graph
on cybernetics, 2019. contrastive learning with augmentations,” Proc. of NeurIPS, 2020.
[26] H. Gao, J. Pei, and H. Huang, “Progan: Network embedding via [55] S. Thakoor, C. Tallec, M. G. Azar, R. Munos, P. Veličković, and
proximity generative adversarial network,” in Proc. of KDD, 2019. M. Valko, “Bootstrapped representation learning on graphs,” in ICLR
[27] Y. Jia, Q. Zhang, W. Zhang, and X. Wang, “Communitygan: Community 2021 Workshop on Geometrical and Topological Representation Learn-
detection with generative adversarial nets,” in Proc. of WWW, 2019. ing, 2021.
[28] C. Wang, S. Pan, R. Hu, G. Long, J. Jiang, and C. Zhang, “At- [56] K. Liang, Y. Liu, S. Zhou, X. Liu, and W. Tu, “Relational symmetry
tributed graph clustering: A deep attentional embedding approach,” based knowledge graph contrastive learning,” 2022.
arXiv preprint arXiv:1906.06532, 2019. [57] Y. Zhu, Y. Xu, Q. Liu, and S. Wu, “An empirical study of graph
[29] J. Xie, R. Girshick, and A. Farhadi, “Unsupervised deep embedding for contrastive learning,” arXiv preprint arXiv:2109.01116, 2021.
clustering analysis,” in Proc. of ICML, 2016. [58] G. Cui, J. Zhou, C. Yang, and Z. Liu, “Adaptive graph encoder for
[30] X. Guo, L. Gao, X. Liu, and J. Yin, “Improved deep embedded attributed graph embedding,” in Proc. of KDD, 2020.
clustering with local structure preservation.” in Proc. of IJCAI, 2017. [59] K. Hassani and A. H. Khasahmadi, “Contrastive multi-view representa-
[31] J. Park, M. Lee, H. J. Chang, K. Lee, and J. Y. Choi, “Symmetric tion learning on graphs,” in Proc. of ICML, 2020.
graph convolutional autoencoder for unsupervised graph representation [60] E. Pan and Z. Kang, “Multi-view contrastive graph clustering,” Proc. of
learning,” in Proc. of ICCV, 2019. NeurIPS, 2021.
[32] X. Zhang, H. Liu, Q. Li, and X.-M. Wu, “Attributed graph clustering [61] X. Wang, N. Liu, H. Han, and C. Shi, “Self-supervised heterogeneous
via adaptive graph convolution,” in Proc. of IJCAI, 2019. graph neural network with co-contrastive learning,” in Proc. of KDD,
[33] D. Bo, X. Wang, C. Shi, M. Zhu, E. Lu, and P. Cui, “Structural deep 2021.
clustering network,” in Proc. of WWW, 2020. [62] H. Zhao, X. Yang, Z. Wang, E. Yang, and C. Deng, “Graph debiased
[34] W. Tu, S. Zhou, X. Liu, X. Guo, Z. Cai, J. Cheng et al., “Deep fusion contrastive learning with joint representation clustering,” in Proc. of
clustering network,” arXiv preprint arXiv:2012.09600, 2020. IJCAI, 2021.
[35] F. M. Bianchi, D. Grattarola, and C. Alippi, “Spectral clustering with [63] N. Lee, J. Lee, and C. Park, “Augmentation-free self-supervised learning
graph neural networks for graph pooling,” in Proc. of ICML, 2020. on graphs,” arXiv preprint arXiv:2112.02472, 2021.
18
[64] Y. Liu, X. Yang, S. Zhou, and X. Liu, “Simple contrastive graph clus- [92] P.-Y. Huang, R. Frederking et al., “Rwr-gae: Random walk regulariza-
tering,” IEEE Transactions on Neural Networks and Learning Systems, tion for graph auto encoders,” arXiv preprint arXiv:1908.04003, 2019.
2023. [93] H. Zhang, P. Li, R. Zhang, and X. Li, “Embedding graph auto-encoder
[65] Y. Liu, Y. Zheng, D. Zhang, H. Chen, H. Peng, and S. Pan, “Towards for graph clustering,” IEEE Transactions on Neural Networks and
unsupervised deep graph structure learning,” in Proceedings of the ACM Learning Systems, 2022.
Web Conference 2022, 2022. [94] W. Li, S. Wang, X. Guo, and E. Zhu, “Deep graph clustering with
[66] Y. Liu, W. Tu, S. Zhou, X. Liu, L. Song, X. Yang, and E. Zhu, “Deep multi-level subspace fusion,” Pattern Recognition, 2023.
graph clustering via dual correlation reduction,” in Proc. of AAAI, 2022. [95] J. Li, J. Yu, J. Li, H. Zhang, K. Zhao, Y. Rong, H. Cheng, and J. Huang,
[67] Y. Liu, S. Zhou, X. Liu, W. Tu, and X. Yang, “Improved dual correlation “Dirichlet graph variational autoencoder,” Proc. of NeurIPS, 2020.
reduction network,” arXiv preprint arXiv:2202.12533, 2022. [96] P. Zhu, J. Li, Y. Wang, B. Xiao, S. Zhao, and Q. Hu, “Collaborative
[68] Y. Liu, X. Yang, S. Zhou, X. Liu, Z. Wang, K. Liang, W. Tu, L. Li, decision-reinforced self-supervision for attributed graph clustering,”
J. Duan, and C. Chen, “Hard sample aware network for contrastive IEEE Transactions on Neural Networks and Learning Systems, 2022.
deep graph clustering,” in Proc. of AAAI, 2023. [97] H. Zhong, J. Wu, C. Chen, J. Huang, M. Deng, L. Nie, Z. Lin, and X.-S.
[69] X. Yang, Y. Liu, S. Zhou, S. Wang, W. Tu, Q. Zheng, X. Liu, L. Fang, Hua, “Graph contrastive clustering,” in Proc. of ICCV, 2021.
and E. Zhu, “Cluster-guided contrastive graph clustering network,” in [98] S. Ding, B. Wu, X. Xu, L. Guo, and L. Ding, “Graph clustering network
Proc. of AAAI, 2023. with structure embedding enhanced,” Pattern Recognition, 2023.
[70] F. Devvrit, A. Sinha, I. Dhillon, and P. Jain, “S3gc: Scalable self- [99] N. Mrabah, M. Bouguessa, and R. Ksantini, “Escaping feature twist: A
supervised graph clustering,” 2022. variational graph auto-encoder for node clustering,” Proceedings of the
[71] Y. Liu, K. Liang, J. Xia, S. Zhou, X. Yang, X. Liu, and S. Z. Li, “Dink- 31st IJCAI.
net: Neural clustering on large graphs,” in Proc. of ICML, 2023. [100] L. Guo and Q. Dai, “Graph clustering via variational graph embedding,”
[72] W. Shiao, U. S. Saini, Y. Liu, T. Zhao, N. Shah, and E. E. Papalex- Pattern Recognition, 2022.
akis, “Carl-g: Clustering-accelerated representation learning on graphs,” [101] C. Wang, S. Pan, P. Y. Celina, R. Hu, G. Long, and C. Zhang, “Deep
ACM SIGKDD Explorations Newsletter, 2023. neighbor-aware embedding for node clustering in attributed graphs,”
[73] L. Sun, F. Wang, J. Ye, H. Peng, and P. S. Yu, “Contrastive graph Pattern Recognition, 2022.
clustering in curvature spaces,” 2023. [102] L. Yang, Y. Wang, J. Gu, C. Wang, X. Cao, and Y. Guo, “Jane: Jointly
[74] E. Pan and Z. Kang, “Beyond homophily: Reconstructing structure for adversarial network embedding.” in Proc. of IJCAI, 2020.
graph-agnostic clustering,” 2023. [103] L. Yu, S. Pei, L. Ding, J. Zhou, L. Li, C. Zhang, and X. Zhang, “Sail:
[75] Y. Wen, S. Wang, Q. Liao, W. Liang, K. Liang, X. Wan, and X. Liu, Self-augmented graph contrastive learning,” in Proc. of AAAI, 2022.
“Unpaired multi-view graph clustering with cross-view structure match- [104] W. Xia, Q. Wang, Q. Gao, M. Yang, and X. Gao, “Self-consistent
ing,” IEEE Transactions on Neural Networks and Learning Systems, contrastive attributed graph clustering with pseudo-label prompt,” IEEE
2023. Transactions on Multimedia, 2022.
[76] J.-O. Palacio-Niño and F. Berzal, “Evaluation metrics for unsupervised [105] T. Zhang, Y. Xiong, J. Zhang, Y. Zhang, Y. Jiao, and Y. Zhu, “Commdgi:
learning algorithms,” arXiv preprint arXiv:1905.05667, 2019. community detection oriented deep graph infomax,” in Proc. of CIKM,
[77] A. Lutov, M. Khayati, and P. Cudré-Mauroux, “Accuracy evaluation 2020.
of overlapping and multi-resolution clustering algorithms on large [106] Z. Tao, H. Liu, J. Li, Z. Wang, and Y. Fu, “Adversarial graph embedding
datasets,” in 2019 IEEE International Conference on Big Data and for ensemble clustering,” in International Joint Conferences on Artificial
Smart Computing (BigComp), 2019. Intelligence Organization, 2019.
[78] A. F. McDaid, D. Greene, and N. Hurley, “Normalized mutual infor- [107] H. Liang and J. Gao, “Wasserstein adversarially regularized graph
mation to evaluate overlapping community finding algorithms,” arXiv autoencoder,” Neurocomputing, 2023.
preprint arXiv:1110.2515, 2011. [108] H. Sun, Y. Li, B. Lv, W. Yan, L. He, S. Qiao, and J. Huang, “Graph
[79] K. Y. Yeung and W. L. Ruzzo, “Details of the adjusted rand index community infomax,” ACM Transactions on Knowledge Discovery from
and clustering algorithms, supplement to the paper an empirical study Data (TKDD), 2021.
on principal component analysis for clustering gene expression data,” [109] T. Wang, G. Yang, Q. He, Z. Zhang, and J. Wu, “Ncagc: A neighborhood
Bioinformatics, 2001. contrast framework for attributed graph clustering,” arXiv preprint
[80] M. D. Plummer and L. Lovász, Matching theory. Elsevier, 1986. arXiv:2206.07897, 2022.
[81] S. Aranganayagi and K. Thangavel, “Clustering categorical data us- [110] H. Xu, W. Xia, Q. Gao, J. Han, and X. Gao, “Graph embedding cluster-
ing silhouette coefficient as a relocating measure,” in International ing: Graph attention auto-encoder with cluster-specificity distribution,”
conference on computational intelligence and multimedia applications Neural Networks, 2021.
(ICCIMA 2007), 2007. [111] Z. Peng, H. Liu, Y. Jia, and J. Hou, “Attention-driven graph clustering
[82] X. Wang and Y. Xu, “An improved index for clustering validation based network,” in Proc. of ACM MM, 2021.
on silhouette index and calinski-harabasz index,” in IOP Conference [112] ——, “Deep attention-guided graph clustering with dual self-
Series: Materials Science and Engineering, 2019. supervision,” IEEE Transactions on Circuits and Systems for Video
[83] S. Petrovic, “A comparison between the silhouette index and the davies- Technology, 2022.
bouldin index in labelling ids clusters,” in Proceedings of the 11th [113] Z. Zhong, G. Gonzalez, D. Grattarola, and J. Pang, “Unsupervised
Nordic workshop of secure IT systems, 2006. network embedding beyond homophily,” Transactions on Machine
[84] D. Jin, B. Li, P. Jiao, D. He, and W. Zhang, “Network-specific varia- Learning Research, 2022.
tional auto-encoder for embedding in attribute networks.” in Proc. of [114] G. K. Kulatilleke, M. Portmann, and S. S. Chandra, “Scgc:
IJCAI, 2019. Self-supervised contrastive graph clustering,” arXiv preprint
[85] C. Yang, M. Liu, Z. Wang, L. Liu, and J. Han, “Graph clustering with arXiv:2204.12656, 2022.
dynamic embedding,” arXiv preprint arXiv:1712.08249, 2017. [115] L. Gong, S. Zhou, X. Liu, and W. Tu, “Attributed graph clustering with
[86] Y. Liu, K. Liang, J. Xia, X. Yang, S. Zhou, M. Liu, X. Liu, and S. Z. Li, dual redundancy reduction,” in Proc. of IJCAI, 2022.
“Reinforcement graph clustering with unknown cluster number,” 2023. [116] D. Luo, J. Ni, S. Wang, Y. Bian, X. Yu, and X. Zhang, “Deep multi-
[87] X. Yang, C. Tan, Y. Liu, K. Liang, S. Wang, S. Zhou, J. Xia, S. Z. Li, graph clustering via attentive cross-graph association,” in Proceedings
X. Liu, and E. Zhu, “Convert: Contrastive graph clustering with reliable of the 13th international conference on web search and data mining,
augmentation,” arXiv preprint arXiv:2308.08963, 2023. 2020.
[88] X. Yang, Y. Liu, S. Zhou, S. Wang, X. Liu, and E. Zhu, “Contrastive [117] R. A. Khan and M. Kleinsteuber, “Cluster-aware heterogeneous infor-
deep graph clustering with learnable augmentation,” arXiv preprint mation network embedding,” in Proc. of WSDM, 2022.
arXiv:2212.03559, 2022. [118] T. Li, W. Wang, P. Jiao, Y. Wang, R. Ding, H. Wu, L. Pan, and D. Jin,
[89] B. Hui, P. Zhu, and Q. Hu, “Collaborative graph convolutional networks: “Exploring temporal community structure via network embedding,”
Unsupervised learning meets semi-supervised learning,” in Proc. of IEEE Transactions on Cybernetics, 2022.
AAAI, 2020. [119] M. Liu, Y. Liu, K. Liang, S. Wang, S. Zhou, and X. Liu, “Deep temporal
[90] R. Hu, S. Pan, G. Long, Q. Lu, L. Zhu, and J. Jiang, “Going deep: graph clustering,” arXiv preprint arXiv:2305.10738, 2023.
Graph convolutional ladder-shape networks,” in Proc. of AAAI, 2020. [120] N. Park, R. Rossi, E. Koh, I. A. Burhanuddin, S. Kim, F. Du, N. Ahmed,
[91] Y. Hu, X. Li, Y. Wang, Y. Wu, Y. Zhao, C. Yan, J. Yin, and Y. Gao, and C. Faloutsos, “Cgc: Contrastive graph clustering forcommunity
“Adaptive hypergraph auto-encoder for relational data clustering,” IEEE detection and tracking,” in Proceedings of the ACM Web Conference
Transactions on Knowledge and Data Engineering, 2021. 2022, 2022.
19
[121] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of [152] W. Xia, Q. Gao, Q. Wang, X. Gao, C. Ding, and D. Tao, “Tensorized
gated recurrent neural networks on sequence modeling,” arXiv preprint bipartite graph learning for multi-view clustering,” IEEE Transactions
arXiv:1412.3555, 2014. on Pattern Analysis and Machine Intelligence, 2022.
[122] M. Liu, K. Liang, Y. Liu, S. Wang, S. Zhou, and X. Liu, “arxiv4tgc: [153] Y. Liu, M. Jin, S. Pan, C. Zhou, Y. Zheng, F. Xia, and S. Y. Philip,
Large-scale datasets for temporal graph clustering,” arXiv preprint “Graph self-supervised learning: A survey,” IEEE Transactions on
arXiv:2306.04962, 2023. Knowledge and Data Engineering, 2022.
[123] S. K. Pal and S. Mitra, “Multilayer perceptron, fuzzy sets, classifiac- [154] A. Rodriguez and A. Laio, “Clustering by fast search and find of density
tion,” 1992. peaks,” science, 2014.
[124] W. Jin, X. Liu, X. Zhao, Y. Ma, N. Shah, and J. Tang, “Automated [155] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath,
self-supervised learning for graphs,” in Proc. of ICLR, 2021. “Deep reinforcement learning: A brief survey,” IEEE Signal Processing
[125] M. Ju, T. Zhao, Q. Wen, W. Yu, N. Shah, Y. Ye, and C. Zhang, Magazine, 2017.
“Multi-task self-supervised graph neural networks enable strongefr task [156] P. Makwana, T. Kodinariya, and P. Makwana, “Review on determining
generalization,” in Proc. of ICLR, 2022. of cluster in k-means clustering review on determining number of clus-
[126] D. A. Reynolds, “Gaussian mixture models.” Encyclopedia of biomet- ter in k-means clustering,” International Journal of Advance Research
rics, 2009. in Computer Science and Management Studies, 2013.
[127] L. Li, S. Wang, X. Liu, E. Zhu, L. Shen, K. Li, and K. Li, “Local [157] Z. Wang, L. Zheng, Y. Li, and S. Wang, “Linkage based face clustering
sample-weighted multiple kernel clustering with consensus discrimina- via graph convolution network,” in Proc. of CVPR, 2019.
tive graph,” IEEE Trans. Neural Networks, 2022. [158] Y. Wang, Y. Zhang, F. Zhang, S. Wang, M. Lin, Y. Zhang, and X. Sun,
[128] J. Zhang, L. Li, S. Wang, J. Liu, Y. Liu, X. Liu, and E. Zhu, “Multiple “Ada-nets: Face clustering via adaptive neighbour discovery in the
kernel clustering with dual noise minimization,” in Proc. of the 30-th structure space,” arXiv preprint arXiv:2202.03800, 2022.
ACM Int. Conf. Multimedia, Lisboa, Portugal, 2022. [159] K. Zhang, T. Li, S. Shen, B. Liu, J. Chen, and Q. Liu, “Adaptive graph
[129] L. Li, J. Zhang, S. Wang, X. Liu, K. Li, and K. Li, “Multi-view bipartite convolutional network with attention graph clustering for co-saliency
graph clustering with coupled noisy feature filter,” IEEE Trans. Knowl. detection,” in Proc. of CVPR, 2020.
Data Eng., 2023. [160] H. Alwassel, D. Mahajan, B. Korbar, L. Torresani, B. Ghanem, and
[130] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” D. Tran, “Self-supervised learning by cross-modal audio-video cluster-
arXiv preprint arXiv:1412.6980, 2014. ing,” Proc. of NeurIPS, 2020.
[131] L. Bottou, “Large-scale machine learning with stochastic gradient [161] B. Chiu, S. K. Sahu, D. Thomas, N. Sengupta, and M. Mahdy, “Autoen-
descent,” in Proceedings of COMPSTAT’2010, 2010. coding keyword correlation graph for document clustering,” in Proc. of
[132] W. Hu, M. Fey, M. Zitnik, Y. Dong, H. Ren, B. Liu, M. Catasta, and ACL, 2020.
J. Leskovec, “Open graph benchmark: Datasets for machine learning on [162] S. Qin, T. Jiang, S. Wu, N. Wang, and X. Zhao, “Graph convolution-
graphs,” Proc. of NeurIPS, 2020. based deep clustering for speech separation,” IEEE Access, 2020.
[133] R. S. Nickerson, “Confirmation bias: A ubiquitous phenomenon in many [163] Y. Zhang, Z. Wang, and J. Shang, “Clusterllm: Large language models
guises,” Review of general psychology, 1998. as a guide for text clustering,” arXiv preprint arXiv:2305.14871, 2023.
[134] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation [164] S. Cavallari, V. W. Zheng, H. Cai, K. C.-C. Chang, and E. Cambria,
learning on large graphs,” Proc. of NeurIPS, 2017. “Learning community embedding with community detection and node
[135] C. R. Shalizi and A. C. Thomas, “Homophily and contagion are generi- embedding on graphs,” in Proceedings of the 2017 ACM on Conference
cally confounded in observational social network studies,” Sociological on Information and Knowledge Management, 2017.
methods & research, 2011. [165] B. Rozemberczki, R. Davies, R. Sarkar, and C. Sutton, “Gemsec: Graph
embedding with self clustering,” in Proceedings of the 2019 IEEE/ACM
[136] X. Zheng, Y. Liu, S. Pan, M. Zhang, D. Jin, and P. S. Yu, “Graph
international conference on advances in social networks analysis and
neural networks for graphs with heterophily: A survey,” arXiv preprint
mining, 2019.
arXiv:2202.07082, 2022.
[166] C. Tu, X. Zeng, H. Wang, Z. Zhang, Z. Liu, M. Sun, B. Zhang, and
[137] W. Tu, S. Zhou, X. Liu, Y. Liu, Z. Cai, E. Zhu, Z. Changwang, and
L. Lin, “A unified framework for community detection and network
J. Cheng, “Initializing then refining: A simple graph attribute imputation
representation learning,” IEEE Transactions on Knowledge and Data
network,” IJCAI, pages, 2022.
Engineering, 2018.
[138] X. Chen, S. Chen, J. Yao, H. Zheng, Y. Zhang, and I. W. Tsang,
[167] F. Liu, S. Xue, J. Wu, C. Zhou, W. Hu, C. Paris, S. Nepal, J. Yang, and
“Learning on attribute-missing graphs,” IEEE transactions on pattern
P. S. Yu, “Deep learning for community detection: progress, challenges
analysis and machine intelligence, 2020.
and opportunities,” arXiv preprint arXiv:2005.08225, 2020.
[139] S. Chen and Y. C. Eldar, “Graph signal denoising via unrolling net- [168] I. Ahmed, T. Galoppo, X. Hu, and Y. Ding, “Graph regularized au-
works,” in Proc. of ICASSP, 2021. toencoder and its application in unsupervised anomaly detection,” IEEE
[140] T. H. Do, D. M. Nguyen, and N. Deligiannis, “Graph auto-encoder for Transactions on Pattern Analysis and Machine Intelligence, 2021.
graph signal denoising,” in Proc. of ICASSP, 2020. [169] X. Ma, J. Wu, S. Xue, J. Yang, C. Zhou, Q. Z. Sheng, H. Xiong, and
[141] J. Xia, H. Lin, Y. Xu, L. Wu, Z. Gao, S. Li, and S. Z. Li, “Towards L. Akoglu, “A comprehensive survey on graph anomaly detection with
robust graph neural networks against label noise,” 2020. deep learning,” IEEE Transactions on Knowledge and Data Engineer-
[142] P. Baldi, “Autoencoders, unsupervised learning, and deep architectures,” ing, 2021.
in Proceedings of ICML workshop on unsupervised and transfer learn- [170] Y. Chen, Z. Liu, J. Li, J. McAuley, and C. Xiong, “Intent contrastive
ing, 2012. learning for sequential recommendation,” in Proceedings of the ACM
[143] S. Zheng, Y. Song, T. Leung, and I. Goodfellow, “Improving the Web Conference 2022, 2022.
robustness of deep neural networks via stability training,” in Proc. of [171] G. Grunig, N. Durmus, Y. Zhang, Y. Lu, S. Pehlivan, Y. Wang, K. Doo,
CVPR, 2016. M. L. Cotrina-Vidal, R. Goldring, K. I. Berger et al., “Molecular
[144] A. Rozsa, M. Gunther, and T. E. Boult, “Towards robust deep neural clustering analysis of blood biomarkers in world trade center exposed
networks with bang,” arXiv preprint arXiv:1612.00138, 2016. community members with persistent lower respiratory symptoms,” In-
[145] M. Li, T. Zhang, Y. Chen, and A. J. Smola, “Efficient mini-batch ternational journal of environmental research and public health, 2022.
training for stochastic optimization,” in Proc. of KDD, 2014. [172] J. Xia, Y. Zhu, Y. Du, Y. Liu, and S. Z. Li, “A systematic survey of
[146] L. Lovász, “Random walks on graphs,” Combinatorics, Paul erdos is molecular pre-trained models,” arXiv preprint arXiv:2210.16484, 2022.
eighty, 1993. [173] H. Xue, V. Mallawaarachchi, Y. Zhang, V. Rajan, and Y. Lin, “Rep-
[147] B. Yang, X. Fu, N. D. Sidiropoulos, and M. Hong, “Towards k-means- bin: Constraint-based graph representation learning for metagenomic
friendly spaces: Simultaneous deep learning and clustering,” in Proc. of binning,” in Proc. of AAAI, 2022.
ICML, 2017. [174] H. Alashwal, M. El Halaby, J. J. Crouse, A. Abdalla, and A. A.
[148] A. Grover and J. Leskovec, “node2vec: Scalable feature learning for Moustafa, “The application of unsupervised clustering methods to
networks,” in Proc. of KDD, 2016. alzheimer’s disease,” Frontiers in computational neuroscience, 2019.
[149] J. Xia, L. Wu, G. Wang, J. Chen, and S. Z. Li, “Progcl: Rethinking hard [175] R. Jiang, S. Han, Y. Yu, and W. Ding, “An access control model for
negative mining in graph contrastive learning,” in Proc. of ICML, 2022. medical big data based on clustering and risk,” Information Sciences,
[150] J. Newling and F. Fleuret, “Nested mini-batch k-means,” Proc. of 2023.
NeurIPS, 2016. [176] R. Dhaundiyal, A. Tripathi, K. Joshi, M. Diwakar, and P. Singh,
[151] S. Wang, X. Liu, L. Liu, W. Tu, X. Zhu, J. Liu, S. Zhou, and “Clustering based multi-modality medical image fusion,” in Journal of
E. Zhu, “Highly-efficient incomplete large-scale multi-view clustering Physics: Conference Series, 2020.
with consensus bipartite graph,” in Proc. of CVPR, 2022.
20
Yue Liu is a computer science master’s stu- Chenchen Fan is a postdoctoral researcher at
dent at National University of Dense Technol- the Chinese PLA General Hospital. Prior to this,
ogy (NUDT). His research interests include self- he obtained his PhD from the Institute of Au-
supervised learning, deep graph clustering, and tomation, Chinese Academy of Sciences and
knowledge graphs. He has published 10+ peer- his bachelor’s degree from the Beijing University
reviewed papers, including ICLR, ICML, AAAI, of Chemical Technology. His research interests
IJCAI, ACM MM, SIGIR, IEEE T-KDE, and IEEE include medical artificial intelligence, medical im-
T-NNLS. More detailed information is listed on age processing, etc.
his homepage: https://yueliu1999.github.io/.
Ke Liang is currently pursuing a Ph.D. degree Kunlun He received his M.D. degree from
at the National University of Defense Technol- The 3rd Military Medical University, Chongqing,
ogy (NUDT). Before joining NUDT, he got BSc China, in 1988, and Ph.D. degree in Cardiol-
degree from Beihang University (BUAA) in 2017 ogy from Chinese PLA Medical School, Beijing,
and MSc degree from Pennsylvania State Uni- China, in 1999. He worked as a postdoctoral
versity (PSU) in 2021. His research interests in- research fellow at the Division of circulatory
clude knowledge graphs, graph learning, health- physiology of Columbia University from 1999 to
care AI, and urban computing. He has published 2003. He is the director and professor of Med-
several top-tier journal and conference papers, ical Big Data Research Center, Chinese PLA
e.g., IEEE T-KDE, IEEE T-NNLS, SIGIR, AAAI, General Hospital, Beijing, China. His research
ICML, ACM MM, etc. interests include medical big data and artificial
intelligence for cardiovascular disease.