[go: up one dir, main page]

0% found this document useful (0 votes)
45 views20 pages

A Survey of Deep Graph Clustering

A Survey of Deep Graph Clustering: Taxonomy, Challenge, Application, and Open Resource

Uploaded by

Sinelson Hu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views20 pages

A Survey of Deep Graph Clustering

A Survey of Deep Graph Clustering: Taxonomy, Challenge, Application, and Open Resource

Uploaded by

Sinelson Hu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

1

A Survey of Deep Graph Clustering: Taxonomy,


Challenge, Application, and Open Resource
Yue Liu, Jun Xia, Sihang Zhou,
Xihong Yang, Ke Liang, Chenchen Fan, Yan Zhuang,
Stan Z. Li, Fellow, IEEE, Xinwang Liu† , Senior Member, IEEE, Kunlun He†

Abstract—Graph clustering, which aims to divide nodes in the graph into several distinct clusters, is a fundamental yet challenging task.
Benefiting from the powerful representation capability of deep learning, deep graph clustering methods have achieved great success in
recent years. However, the corresponding survey paper is relatively scarce, and it is imminent to make a summary of this field. From
arXiv:2211.12875v4 [cs.LG] 12 Sep 2023

this motivation, we conduct a comprehensive survey of deep graph clustering. Firstly, we introduce formulaic definition, evaluation,
and development in this field. Secondly, the taxonomy of deep graph clustering methods is presented based on four different criteria,
including graph type, network architecture, learning paradigm, and clustering method. Thirdly, we carefully analyze the existing methods
via extensive experiments and summarize the challenges and2022/11/21
opportunities
14:18 from five perspectives, including taxonomy.drawio
graph data quality, stability,
scalability, discriminative capability, and unknown cluster number. Besides, the applications of deep graph clustering methods in six
Pure Structure Graph
domains, including computer vision, natural language processing, recommendation
Attribute Graph systems, social network analyses, Network MLP
bioinformatics,
Graph Type GNN
and medical science, are presented. Last but not least, thisHeterogeneous Graph
paper provides Architecture
open resource supports, including 1) a collection
Hybrid
Dynamic Graph
(https://github.com/yueliu1999/Awesome-Deep-Graph-Clustering) of state-of-the-art deep graph clustering methods (papers, codes,
and datasets) and 2) a unified framework (https://github.com/Marigoldwu/A-Unified-Framework-for-Deep-Attribute-Graph-Clustering)
of deep graph clustering. We hope this work can serve as a quick guide and help researchers overcome Taxonomychallenges
of in this vibrant field.
Deep Graph Clustering
Index Terms—Deep Graph Clustering, Graph Neural Network, Self-Supervised Learning, Clustering Analysis.
Reconstructive
Traditional Clustering Clustering Learning Adversarial

Neural Clustering
Method Paradigm Contrastive
Hybrid

1 I NTRODUCTION
R aph clustering is an important and challenging task to is trained in a self-supervised manner to embed the nodes into the
G separate nodes of the graph into different clusters in an
unsupervised manner. In recent years, benefiting from the strong
latent space. After encoding, the clustering method C separates the
node embeddings Z into several disjoint clusters. The formulaic
representation capability of deep learning [1], especially graph definitions and the important baselines of deep graph clustering
neural networks (GNNs) [2], [3], [4], [5], [6], [7], [8], deep graph are deliberated in Section 2.
clustering has witnessed fruitful advances. However, unlike the
deep clustering area [9], [10], [11], [12], the survey paper of deep Graph
Heterogeneous
Neural
Graph
graph clustering [13] is relatively scarce. For example, [14], [15] Pure Network Hybrid
Structure Method
mainly survey the papers about community detection with deep Graph
Attribute Network
learning. To better assist researchers in reviewing, summarizing, Graph Type
Graph
Architecture
and planning for the future, a comprehensive survey of deep
graph clustering is expected. From this motivation, we make a Dynamic
Multi-layer

comprehensive survey of deep graph clustering in this paper. Graph Taxonomy of Perceptron

2022/11/21 14:37 overall.drawio


Deep Graph Clustering
Traditional Contrastive

Input Graph Clustering Result Clustering

Clustering Learning Hybrid


1 2 Method Paradigm Method

1 2 Neural
Reconstructive Adversarial
Clustering

3
3
Figure 2: The taxonomy of deep graph clustering.

Figure 1: The general pipeline of deep graph clustering.


Secondly, as shown in Figure 2, we contribute a structured
taxonomy to provide a broad overview of this field, categoriz-
Firstly, we demonstrate the general pipeline of deep graph ing existing works from four perspectives: graph type, network
clustering. As shown in Figure 1, the encoding neural network F architecture, learning paradigm, and clustering method. More
specifically, the input graph type can be classified into four distinct
• Y. Liu, S. Zhou, X. Yang, K. Liang, and X. Liu are with Na- categories: pure structure graph, attribute graph, heterogeneous
tional University of Defense Technology, Changsha, China. (E-mail:
yueliu19990731@163.com). J. Xia and Stan Z. Li are with Westlake
about:blank
graph, and dynamic graph. We analyze the characteristics of each 1/1
University, Hangzhou, China. C. Fan and Y. Zhuang are with Big Data graph type and introduce the corresponding processing methods.
Research Center, Chinese PLA General Hospital, Beijing, China. Besides, for the network architecture, the existing deep graph
• † : Corresponding author.
clustering methods are grouped into multi-layer perceptron-based
2

(MLP-based) methods, graph-neural network-based (GNN-based) Table 1: Basic notation summary.


methods, and hybrid methods. The benefits and drawbacks of each
type are carefully discussed. Moreover, the learning paradigms are Notations Meanings
classified into reconstructive, adversarial, contrastive, and hybrid GS Pure structure graph
paradigms. For each learning paradigm, the general pipeline is GA Attribute graph
summarized in detail. In addition, the clustering methods are GH Heterogeneous graph
separated into traditional clustering methods and neural clustering GD Dynamic graph
methods. The advantages and disadvantages of traditional clus- V Set of nodes
tering are analyzed, and the technological evolution of neural E Set of edges
F Encoding network
clustering is summarized in depth. We elaborate on the taxonomy
C Clustering method
of deep graph clustering in Section 3. M Evaluation method
Despite the remarkable progress, this fast-growing field is n∈R Sample number
still fraught with several crucial challenges. Therefore, we con- d′ ∈ R Attribute number
duct comprehensive experiments to summarize and analyze the d∈R Dimension number of learned feature
challenges of deep graph clustering. In addition, we carefully k∈R Cluster number
discuss the potential opportunities to solve the challenges in this s∈R Clustering score
field. Specifically, Figure 3 demonstrates the problems of graph y ∈ Rn Ground truth label vector
data quality, stability, scalability, discriminative capability, and ŷ ∈ Rn Predicted cluster-ID vector

unknown cluster number. The detailed analysis and the potential X ∈ Rn×d Node attribute matrix
solutions are provided in Section 4. A ∈ Rn×n Original adjacency matrix
2022/11/22 00:29 challenge.drawio Z ∈ Rn×d Node embeddings
3 or 4 Clusters?
C ∈ Rk×d Cluster center embeddings
? Q ∈ Rn×k Soft clustering assignment matrix
P ∈ Rn×k High confidence clustering assignment matrix
Link
Error

Graph Data Unknown


Quality Cluster Number 2 D EEP G RAPH C LUSTERING
Error Attribute

Indiscriminative In this section, we first introduce the basic notation and the
formulaic definition of deep graph clustering. Then the critical
Stability Challenges in
Deep Graph Clustering deep graph clustering baselines are deliberated.
Discriminative
Stable
NMI: 79.60±0.50 Discriminative 2.1 Notation
ARI: 45.80±0.45
Capability
NMI: 80.34±5.21 Scalability The basic notations in this paper are summarized in Table 1. First,
ARI: 46.21±7.45
Unstable
we take the attribute graph as an example input of the deep graph
clustering methods. Given an attribute graph GA = {V, E, X},
1B Edges V = {v1 , v2 , . . . , vn } denotes a node set, which contains k
111M Nodes
classes of n nodes. Besides, E denotes a set of edges that link
Figure 3: The challenges in deep graph clustering. nodes. In addition, each node in the graph attaches d′ -dimension

attributes. In matrix form, X ∈ Rn×d denotes the node attribute
n×n
matrix and A ∈ R denotes the adjacency matrix. y ∈ Rn
Moreover, in recent years, deep graph clustering methods have
denotes ground truth labels of nodes. We introduce other types of
been successfully applied to numerous domains, such as social net-
the input graph in Section 3.1.
work analysis, recommendation systems, computer vision, natural
language processing, bioinformatics, medical science, etc. The
detailed information refers to Section 5. The main contributions 2.2 Task Definition
of this paper are summarized as follows. The target of deep graph clustering is to encode nodes in the
• We present a comprehensive survey paper in deep graph graph with the neural networks and divide them into different
clustering domain to help researchers review, summarize, clusters. Figure 1 demonstrates this process. In formulaic, the
solve challenges, and plan for future. self-supervised neural network F first encodes the nodes of the
• We design a taxonomy of recent deep graph clustering attribute graph GA as follows.
methods based on four aspects, i.e., graph type, network
Z = F(GA ) = F(X, A), (1)
architecture, learning paradigm, and clustering method.

• We conduct extensive experiments to summarize the chal- where X ∈ Rn×d denotes the node attribute matrix, A ∈ Rn×n
lenges in the deep graph clustering field from five per-
about:blank
denotes
1/1
the adjacency matrix, and Z ∈ Rn×d denotes the learned
spectives, including graph data quality, stability, scalability, node embeddings. Generally, the self-supervised neural network
discriminative capability, and unknown cluster number. Be- F is trained with pre-text tasks like structure reconstruction,
sides, through the careful analyses, we provide the potential attribute reconstruction, contrastive learning, adversarial learning,
promising technical solutions. etc. Details about network architecture and learning paradigm
• We share two practical open resources, including a collec- are introduced in Section 3.2 and Section 3.3, respectively. After
tion of state-of-the-art deep graph clustering methods and a encoding, the clustering method C groups the nodes into several
unified framework of deep graph clustering. disjoint clusters as follows.
3

Stage Ⅴ & Ⅵ
Reconstructive Contrastive Scalability & Complex Scenses
Stage Ⅳ
DMoN
Adversarial Other Contrastive Learning

Stage Ⅲ
CCGC
Data Fusion
CONGRE
MinCutPool S3GC
GATE
Stage Ⅱ
MVGRL MCGC DCRN CARL-G
Adversarial Learning

Community
AGE HeCo SUBLIME HSAN
GAN
Stage Ⅰ
Graph Clustering via DNNs ProGAN MAGCN SGCMC AFGRL Dink-Net

GAE GALA O2MAC GDCL scTAG SCGC

Graph
HNE DNGR MGAE ARGA DAEGC SDCN DFCN R-GAE DGCN
Encoder

2014 2015 2016 2017 2018 2019 2020 2021 2022 2023

Figure 4: Time line of the important baselines in deep graph clustering field.

the non-linear node representations and then perform k -means


ŷ = C(Z, k), (2) [18] to separate the node embeddings into disjoint clusters in
the GraphEncoder model. After that, DNGR [19] is developed to
where k denotes the cluster number and ŷ ∈ Rn denotes predicted capture graph structural information by the random surfing model.
cluster-ID vector. We deliberate the details of the clustering Although verified effectiveness, the previous methods merely learn
method C in Section 3.4. After the clustering process, evaluation the graph structure information while ignoring the node attributes.
method M tests the clustering performance with various metrics GAE/VGAE [4] is proposed with the graph convolutional encoder
as follows. [2] and a simple inner product decoder to learn both attribute and
s = M(y, ŷ), (3) structural information. Meanwhile, to handle the heterogeneous
graphs, [20] develop a deep heterogeneous graph embedding
where s ∈ R denotes the clustering score and y ∈ Rn denotes
algorithm termed HNE for five downstream tasks. Motivated by
the ground truth label vector. In the next section, we introduce the
GAE/VGAE [4], [21] propose MGAE to learn node representa-
development of deep graph clustering.
tions with the graph auto-encoder and then group the nodes into
clusters with the spectral clustering algorithm [22].
2.3 Development Stage II: introduce adversarial mechanism. Subsequently,
Overview. In this section, we introduce the development of deep motivated by the generative adversarial mechanism [23], several
graph clustering methods. At first, as shown in Figure 4, we adversarial deep graph clustering methods are proposed. For ex-
summarize the essential baselines in the deep graph clustering ample, [24], [25] propose ARGA/ARVGA by enforcing the latent
field. These papers promote the crucial development of deep representations to align a prior distribution in an adversarial learn-
graph clustering and gradually become the milestones in this ing manner. Similarly, ProGAN [26] and CommnityGAN [27] also
field. Additionally, these research papers have been published in utilize generative adversarial networks to generate proximities and
influential international conferences or high-quality journals in optimize embeddings, respectively.
artificial intelligence, machine learning, data mining, computer Stage III: unified framework & data fusion. Although
vision, multimedia, etc. In Figure 4, the important baselines are promising performance has been achieved, [28] indicates that
displayed according to the publish time. Besides, in order to the previous methods are not designed for the specific cluster-
highlight the development trend of deep graph clustering, we ing task. To design a clustering-directed method, [28] propose
roughly categorize the methods into four classes according to a unified framework termed DAEGC with the attention-based
the learning paradigm. The blue, red, yellow, and green boxes graph encoder and clustering alignment loss adopted in deep
indicate the reconstructive methods, adversarial methods, con- clustering methods [29], [30]. In the same year, [31] designed
trastive methods, and other methods. We find that most of the a new symmetric graph auto-encoder architecture named GALA
early methods are based on the reconstructive and adversarial based on the Laplacian sharpening. Besides, [32] proposes AGC
learning paradigms. Recently, contrastive deep graph clustering model with an adaptive graph convolution to capture the clustering
methods become popular and mainstream. Next, we introduce the information in different neighbor hops. Subsequently, SDCN [33]
development of deep graph clustering methods in detail. and DFCN [34] verify the effectiveness of integrating structural
Stage I: graph clustering via deep neural networks. At information and attribute information by the delivery operator and
the early stage, motivated by the great success of deep learning, the information fusion module, respectively. Then, to avoid the
researchers aim to endow the graph clustering methods with strong expensive costs of spectral clustering, [35] formulate a continuous
representation capability of deep neural networks. Concretely, relaxation of the normalized minCUT problem and optimize the
the pioneers [16] adopt the sparse auto-encoder [17] to learn clustering objective with the GNN. It is worth mentioning that
4

some graph pooling methods [36], [37], [38] also contribute to 2.4 Evaluation
this field. Then, O2MAC [39] and MAGCN [40] make attempts This section introduces the evaluation of deep graph clustering.
to exploit deep neural networks for attributed multi-view graph Deep graph clustering is a purely unsupervised task. Therefore
clustering via the multiple graph view reconstruction and the it is hard to evaluate the clustering performance of deep graph
view-consistency information learning, respectively. Besides, a clustering methods without the ground truth. Generally, an ex-
self-supervised method SGCMC [41] utilizes the clustering labels cellent deep graph clustering algorithm can learn the clustering
to guide the network learning, thus improving clustering perfor- distribution that has small with-cluster variance and significant
mance. R-GAE [42] rethink the graph-auto-encoder-based deep between-cluster variance. The mainstream metrics can be cate-
graph clustering methods by considering feature randomness and gorized into two classes [76], i.e., extrinsic metrics and intrinsic
feature drift. And scTAG [43] apply the graph-auto-encoder-based metrics. Calculating extrinsic metrics requires the ground truth
methods to the single-cell RNA sequencing. while calculating the intrinsic metrics does not require labels. In
the research track, researchers conduct experiments on graph data
Stage IV: apply contrastive learning. More recently, con- with human annotations, thus the extrinsic metrics of clustering
trastive learning has become the mainstream paradigm in the are more common. However, in the industry scenario, labels of
domains of vision [44], [45], [46], [47], [48], [49], [50] and graph nodes are always scarce, therefore the intrinsic metrics are more
[51], [52], [53], [54], [55], [56], [57], and contrastive deep graph practical.
clustering methods are increasingly proposed. Specifically, in the Firstly, we introduce four extrinsic metrics, including accuracy
AGE model, [58] first filters the high-frequency noises in node (ACC) [77], normalized mutual information (NMI) [78], adjust
attributes and then trains the encoder by adaptively discriminating rand index (ARI) [79], and F1 score. These metrics need the
the positive and negative samples. In the same year, MVGRL ground truth labels, therefore the typical pre-process operation
[59] generates an augmented structural view and contrasts node is to map the cluster-ID to the class-ID by the Kuhn-Munkres
embeddings from one view with graph embeddings of another algorithm [80]. Then the metrics can be calculated. Specifically,
view and vice versa. Following up, MCGC [60] and HeCo [61] accuracy (ACC) is calculated as follows:
extend the contrastive paradigm to multi-view clustering and
n
heterogeneous graph learning. X ϕ(yi , map(ŷi ))
ACC = , (4)
i=1
n
Stage V: improve contrastive learning. Although the effec-
tiveness of the contrastive learning paradigm has been verified, where ŷi and yi denote the predicted cluster-ID and the ground
there are still many open technical problems. Specifically, [62] truth label of the i-th sample, respectively. Besides, map(·) de-
proposes GDCL to correct sampling bias in contrastive deep notes the Kuhn-Munkres algorithm [80], which maps the predicted
graph clustering. Besides, to avoid the semantic drift caused by cluster-ID to the class-ID. ϕ(·) is an indicator function as formu-
inappropriate data augmentations, AFGRL [63] is proposed by lated: (
replacing data augmentations with node discovery. Differently, 1 if yi = map(ŷi ),
ϕ(yi , map(ŷi )) = (5)
Liu et al. [64] propose an augmentation-free contrastive deep 0 otherwise.
graph clustering method by designing the parameter-unshared
In addition, the F1-score also evaluates the clustering performance.
encoders. Then, to refine the noisy connections in the graph, [65]
It is a balance of precision and recall, calculated as follows.
propose SUBLIME by generating the sketched graph view with
unsupervised structure learning. Moreover, [66], [67] design the precision · recall
F1 = 2 · , (6)
dual correlation reduction strategy in the DCRN model to alleviate precision + recall
the representation collapse problem in deep graph clustering. To
further enhance the discriminative capability of networks, Liu TP TP
precision = , recall = , (7)
et al. [68] guide the networks to learn the hard sample pairs. TP + FP TP + FN
CCGC [69] propose a new positive and negative sample pairs where TP, FP, and FN denote true positive samples, false positive
construction method. samples, and false negative samples, respectively. Further, the
calculation of Normalized Mutual Information (NMI) is based on
Stage VI: scale to large graphs & apply to more complex mutual information. It is more robust to the unbalanced sample
scenarios. However, previous methods fail to scale to the large distribution. It can be formulated as follows.
graphs, easily leading to the out-of-memory and long running time P P p(ŷ,y)
problem. To this end, Devvrit et al. [70] scale the contrastive deep 2 ŷ y p(ŷ, y) log p(ŷ)p(y)
NMI = − P , (8)
graph clustering to large-scale graphs. Additionally, Dink-Net [71]
P
i p(ŷi ) log(p(ŷi )) + j p(yj ) log(p(yj ))
unified the representation learning and clustering optimization into
an end-to-end framework on large-scale graphs via dilation and where p(ŷ), p(y), and p(ŷ, y) denotes the distribution of the
shrink cluster loss functions. Besides, Shiao et al. [72] utilize the predicted results, distribution of the ground truth, and joint dis-
node clustering method to accelerate graph representation learn- tribution of them, respectively. Differently, another metric named
ing. Besides, Sun et al. [73] rethink the graph clustering problem Adjust Rand Index (ARI) is based on the pairwise similarity
from a geometric perspective and introduce the heterogeneous between predicted labels and ground truth labels. It is formulated
curvature space into deep graph clustering. To adapt deep graph as follows:
RI − expectedRI
clustering methods to both homophily and heterophily graphs, ARI = , (9)
DGCN [74] is proposed by designing the mixed graph filter and
max(RI) − expectedRI
the dual encoders. And Wen et al. [75] propose a framework to where RI denote rand index [79] and expectedRI [79] denote the
match the unpaired multi-view graphs. expected rand index. ARI equals 0 indicates real and modeled
5

clustering do not agree on pairing, and ARI equals 1 indicates real


and modeled clustering both represent the same clusters.
Secondly, we introduce three intrinsic metrics, including Sil-
houette Coefficient (SC) [81], Calinski-Harabasz Index (CHI)
[82], and Davies-Bouldin Index (DBI) [83]. Different from pre-
vious extrinsic metrics, the intrinsic metrics are more piratical for
real-world data since they get rid of the ground truth labels. Con-
cretely, Silhouette Coefficient (SC) can be calculated as follows.
1
dis − dis′
SC = , (10)
max(dis, dis′ )
where dis′ denotes the average distance between the node and
all other nodes in the same cluster while dis denotes the average
distance between the node and all other points in the next nearest
cluster. SC is intuitive and easy to understand and has a range of
[−1, 1]. Additionally, Calinski-Harabasz Index can be calculated
as follows. Figure 5: The demonstrations of pure structure graph GS , attribute
k
X graph GA , heterogeneous graph GH , and dynamic graph GD . The
b= mi (ci − ĉ)(ci − ĉ)T , (11) pure structure graph merely contains the connected edges between
i=1
nodes. In addition, each node in the attribute graph attaches
k
X X attributes. Moreover, in heterogeneous graphs, the node types and
w= (x − ci )(x − ci )T , (12) edge types are multiple. Furthermore, the nodes and edges of
i=1 x∈i-th cluster
dynamic graphs will change over time.
b n−k
CHI = × , (13)
w k−1
where b denotes the between-cluster dispersion, and w denotes 3.1.1 Pure Structure Graph
the within-cluster dispersion. n is the number of all nodes, k is In a pure structure graph GS = {V, E}, V = {v1 , v2 , . . . , vn }
the cluster number, and mi is the number of nodes in the i- denotes the node set, which contain n nodes of k categories.
th cluster. Besides, ĉ denote the cluster center of all samples, Besides, E denotes the edge set, which contains l edges. With
and ci denote the i-th cluster center. CHI measures the between- the matrix form, the pure structure graph can be represented by
cluster dispersion against within-cluster ones. The higher score the adjacency matrix A ∈ Rn×n . Here, aij = 1 indicates that
indicates better clusters. Unlike CHI, Davies-Bouldin Index (DBI) i-th node is linked to j -th node while aij = 0 indicates that i-th
can measure the cluster size against the average distance between node is not linked to j -th node in the graph.
clusters. And the lower score indicates better clusters. DBI can be
calculated as follows. 3.1.2 Attribute Graph
si + sj 1X
k Compared with the pure structure graph GS , the attribute graph
rij = , DBI = max rij , (14) GA = {V, E, X} is more complex. Specifically, each node in GA
s′ij k i=1 i̸=j ′
attaches the d′ -dimension attributes. X ∈ Rn×d is denoted as the

where si denotes the average distance between each nodes in the node attribute matrix, where d is the dimension number of node
i-th cluster to the i-th cluster center ci . s′ij denotes the distance attributes. In the matrix form, GA is represented by the adjacency
between the i-th cluster center ci to the j -th cluster center cj . matrix A ∈ Rn×n and node attribute matrix X.
In this paper, the used datasets are open benchmarks, which
have the ground truth of the nodes. Therefore, we adopt four 3.1.3 Heterogeneous Graph
extrinsic metrics, including ACC, NMI, ARI, and F1 score in our The heterogeneous graph at least contains more than one type of
experiments. node or more than one type of edge. Formally, we first denote
the type number of nodes and edges as Tn and Te . Then a
3 TAXONOMY heterogeneous graph GH satisfies Tn + Te > 2, i.e., it contains
We contribute a structured taxonomy to provide a broad overview multiple types of nodes or/and multiple types of edges. Otherwise,
of this field. Concretely, this section introduces the taxonomy of the graph is homogeneous.
deep graph clustering methods from the following four perspec-
tives: graph type, network architecture, learning paradigm, and 3.1.4 Dynamic Graph
clustering method. The surveyed paper are categorized based these Different from the static graph, the dynamic graph GD =
(1) (t) (T )
criteria in Table 2 (part I) and Table 3 (part II). Next, we present {GD , ..., GD , ..., GD } will dynamically change over time,
taxonomy criteria in detail. where t denotes the time step. The changes in dynamic graphs
include changing the linkages between nodes, adding or deleting
3.1 Graph Type nodes, modifying the node features, etc.
First of all, we begin with the input of deep graph clustering. The
input graphs of existing deep graph clustering methods are mainly 3.1.5 Processing Techniques
categorized into four types. Figure 5 demonstrates the details of The first type of graph, the pure structure graph, is relatively easy
these four types of graphs. In what follows, we provide the detailed to process since it merely contains structural information. For
definitions of these graph types. example, in early works, [16] encodes the structural embeddings
6

Table 2: Taxonomy table of deep graph clustering methods: part I. The criteria contain four aspects: graph type, network architecture,
learning paradigm, and clustering method.

Methods Graph Type Network Architecture Learning Paradigm Clustering Method


GraphEncoder [16] Pure Structure Graph MLP Reconstructive Traditional Clustering
DNGR [19] Pure Structure Graph MLP Reconstructive Traditional Clustering
CommunityGAN [27] Pure Structure Graph MLP Adversarial Neural Clustering
NetVAE [84] Attribute Graph MLP Reconstructive Traditional Clustering
DGCN [74] Attribute Graph MLP Reconstructive Neural Clustering
GRACE [85] Attribute Graph MLP Reconstructive Neural Clustering
ProGAN [26] Attribute Graph MLP Adversarial Traditional Clustering
AGE [58] Attribute Graph MLP Contrastive Traditional Clustering
SCGC [64] Attribute Graph MLP Contrastive Traditional Clustering
CCGC [69] Attribute Graph MLP Contrastive Traditional Clustering
HSAN [68] Attribute Graph MLP Contrastive Traditional Clustering
RGC [86] Attribute Graph MLP Contrastive Traditional Clustering
CONVERT [87] Attribute Graph MLP Contrastive Traditional Clustering
GCC-LDA [88] Attribute Graph MLP Contrastive Traditional Clustering
GALA [31] Attribute Graph GNN Reconstructive Traditional Clustering
MGAE [21] Attribute Graph GNN Reconstructive Traditional Clustering
CGCN [89] Attribute Graph GNN Reconstructive Traditional Clustering
GCLN [90] Attribute Graph GNN Reconstructive Traditional Clustering
AHGAE [91] Attribute Graph GNN Reconstructive Traditional Clustering
RWR-GAE [92] Attribute Graph GNN Reconstructive Traditional Clustering
EGAE [93] Attribute Graph GNN Reconstructive Traditional Clustering
DGCSF [94] Attribute Graph GNN Reconstructive Traditional Clustering
DAEGC [28] Attribute Graph GNN Reconstructive Neural Clustering
DGVAE [95] Attribute Graph GNN Reconstructive Neural Clustering
CDRS [96] Attribute Graph GNN Reconstructive Neural Clustering
GCC [97] Attribute Graph GNN Reconstructive Neural Clustering
GC-SEE [98] Attribute Graph GNN Reconstructive Neural Clustering
FT-VGAE [99] Attribute Graph GNN Reconstructive Neural Clustering
scTAG [43] Attribute Graph GNN Reconstructive Neural Clustering
GC-VAE [100] Attribute Graph GNN Reconstructive Neural Clustering
DNENC [101] Attribute Graph GNN Reconstructive Neural Clustering
JANE [102] Attribute Graph GNN Adversarial Traditional Clustering
SAIL [103] Attribute Graph GNN Contrastive Traditional Clustering
AFGRL [63] Attribute Graph GNN Contrastive Traditional Clustering
S3 GC [70] Attribute Graph GNN Contrastive Traditional Clustering
CONGREGATE [73] Attribute Graph GNN Contrastive Traditional Clustering
SCAGC [104] Attribute Graph GNN Contrastive Neural Clustering
CommDGI [105] Attribute Graph GNN Contrastive Neural Clustering
CARL-G [72] Attribute Graph GNN Contrastive Neural Clustering

with the sparse auto-encoders [17]. Besides, [19] apply random datasets for temporal deep graph clustering.
surfing in the graph and embed the graph structure with the
stacked denoising auto-encoder. For attribute graphs, the addi- 3.2 Network Architecture
tional attribute information always brings more process operations For the network architecture, the mainstream deep graph clustering
and performance improvement. For instance, [21] integrates the methods can roughly be categorized into three classes: MLP-based
attributes and the graph structure with the graph convolutional method, GNN-based method, and hybrid methods.
operation [2]. Also, [33] transfers the attribute embeddings to
3.2.1 MLP-based Methods
the GCN layer with the delivery operator. Moreover, [34], [111]
demonstrate the effectiveness of the attribute-structure fusion The MLP-based methods utilize the multi-layer perceptrons
mechanism. The heterogeneous graph is more complex since it (MLPs) [123] to exact the informative features in the graphs. For
may contain different types of nodes and edges. To handle these example, GraphEncoder [16] and DNGR [19] encode the graph
graphs, [61] adopts the meta-path technology while [39] treats structure with the auto-encoders. Subsequently, in ProGAN [26]
them as multi-view graphs. Different from the aforementioned and CommunityGAN [27], the authors adopt MLPs to design the
static graphs, dynamic graphs change over time, increasing the generative adversarial networks. Moreover, based on MLPs, AGE
difficulty of clustering. To solve this problem, [120] performs [58] and SCGC [64] design the adaptive encoder and the parameter
graph clustering in an incremental learning fashion. [118] adopt un-shared encoders to embed the smoothed node features into the
the variant of gated recurrent unit (GRU) [121] to capture the latent space. Although the effectiveness of these methods has been
temporal information. Liu et al. [119] present a general frame- demonstrated, it is difficult for MLPs to capture non-euclidean
work for temporal deep graph clustering and [122] collect several structural information in graphs. Thus, GNN-based methods have
been increasingly proposed in recent years.
7

Table 3: Taxonomy table of deep graph clustering methods: part II. The criteria contain four aspects: graph type, network architecture,
learning paradigm, and clustering method.

Methods Graph Type Network Architecture Learning Paradigm Clustering Method


AGAE [106] Attribute Graph GNN Reconstructive+Adversarial Traditional Clustering
ARGA [25] Attribute Graph GNN Reconstructive+Adversarial Traditional Clustering
WARGA [107] Attribute Graph GNN Reconstructive+Adversarial Traditional Clustering
GCI [108] Attribute Graph GNN Reconstructive+Adversarial Traditional Clustering
NCAGC [109] Attribute Graph GNN Reconstructive+Contrastive Neural Clustering
GEC-CSD [110] Attribute Graph GNN Reconstructive+Adversarial Neural Clustering
SDCN [33] Attribute Graph MLP+GNN Reconstructive Neural Clustering
DFCN [34] Attribute Graph MLP+GNN Reconstructive Neural Clustering
AGCN [111] Attribute Graph MLP+GNN Reconstructive Neural Clustering
R-GAE [42] Attribute Graph MLP+GNN Reconstructive Neural Clustering
DAGC [112] Attribute Graph MLP+GNN Reconstructive Neural Clustering
MVGRL [59] Attribute Graph MLP+GNN Contrastive Traditional Clustering
SUBLIME [65] Attribute Graph MLP+GNN Contrastive Traditional Clustering
GDCL [62] Attribute Graph MLP+GNN Contrastive Neural Clustering
Dink-Net [71] Attribute Graph MLP+GNN Contrastive Neural Clustering
SELENE [113] Attribute Graph MLP+GNN Reconstructive+Contrastive Traditional Clustering
DCRN [66] Attribute Graph MLP+GNN Reconstructive+Contrastive Neural Clustering
IDCRN [67] Attribute Graph MLP+GNN Reconstructive+Contrastive Neural Clustering
SCGC [114] Attribute Graph MLP+GNN Reconstructive+Contrastive Neural Clustering
AGC-DRR [115] Attribute Graph MLP+GNN Adversarial+Contrastive Neural Clustering
DMGC [116] Heterogeneous Graph MLP Reconstructive Traditional Clustering
MAGCN [40] Heterogeneous Graph GNN Reconstructive Neural Clustering
O2MAC [39] Heterogeneous Graph GNN Reconstructive Neural Clustering
HeCo [61] Heterogeneous Graph GNN Contrastive Traditional Clustering
SGCMC [41] Heterogeneous Graph GNN Reconstructive+Contrastive Neural Clustering
VaCA-HINE [117] Heterogeneous Graph GNN Reconstructive+Contrastive Traditional Clustering
HNE [20] Heterogeneous Graph MLP+CNN Reconstructive Traditional Clustering
VGRGMM [118] Dynamic Graph GNN Reconstructive Traditional Clustering
TGC [119] Dynamic Graph MLP Reconstructive Neural Clustering
CGC [120] Dynamic Graph GNN Contrastive Traditional Clustering

3.2.2 GNN-based Methods 3.3 Learning paradigm


The GNN-based methods model the non-euclidean graph data with Based on the learning paradigm, the surveyed methods contain
the GNN encoders like graph convolutional network [2], graph four categories: reconstructive methods, adversarial methods, con-
attention network [3], graph auto-encoder [4], etc. Benefiting from trastive methods, and hybrid methods. Taking the attribute graph
the strong graph structure representation capability, GNN-based as the input graph, we elaborate on these learning paradigms of
2022/11/22 01:43 reconstructive.drawio

methods have achieved promising performance. For instance, deep graph clustering methods as follows.
MGAE [21] is proposed to learn the node attribute and graph
structure with the designed graph auto-encoder. In addition, [31] ?
Decoder
Encoder

design a novel symmetric graph auto-encoder named GALA. Fur-


thermore, the GNNs are also applied to the heterogeneous graphs
in O2MAC [39], MAGCN [40], SGCMC [41], and HeCo [61].
?
However, the entanglement of transformation and propagation in
GNNs will incur heavy computational overhead. Thus, SCGC Input Reconstructed
1
[64] is proposed to improve the efficiency and scalability of the Graph 2 Graph
1 2
existing deep graph clustering methods by decoupling these two 3 3
operations.
Figure 6: The general pipeline of the reconstructive deep graph
3.2.3 Hybrid Methods clustering methods. Firstly, the nodes of the input graph are
encoded into the node embeddings Z via a designed Encoder.
More recently, some researchers have considered integrating the
Subsequently, with the reconstruction pre-text tasks like attribute
advantages of both MLP-based and GNN-based methods. Con-
reconstruction or link reconstruction, the decoder aims to recover
cretely, [33] transfers the embeddings from the auto-encoder to the
the graph information from the learned embeddings Z. Eventually,
GCN layer with a designed delivery operator. In addition, AGCN
in the latent space, the clustering method C groups the nodes into
[111] and DFCN [34] demonstrate the effectiveness of the fusion
different clusters.
of the node attribute feature and the topological graph feature
with the combination of auto-encoder and GCN. Moreover, some
contrastive deep graph clustering methods [59], [62], [65], [66], 3.3.1 Reconstructive Methods
[69], [87], [88], [115] also adopt the hybrid architecture of MLPs The core idea of the reconstructive methods is forcing the network
and GNNs as the backbones. to encode the features in the graph and then recovering (part of)
8

the graph information from the learned embeddings. Thus, the

Encoder
reconstructive methods focus on the intra-data information in the

Graph
Input
graph and adopt the node attributes and graph structure as the
self-supervision signals. The general pipeline of reconstructive

Negative Samples
Positive Samples
Pull Together
deep graph clustering methods is illustrated in Figure 6. The core

Push Away
1
designs of the reconstructive methods are the encoder architecture, Data Augmentation 1
2
2
decoder architecture, and reconstructed objective functions. Re- 3 3
searchers improve reconstructive methods from these perspectives.

Augmented

Encoder
Graph
3.3.2 Adversarial Methods
The adversarial deep graph clustering methods aim to improve
the quality of features by creating a zero-sum game between the
generator and the discriminator. Specifically, the discriminator is
trained to recognize whether learned features are from the real Figure 8: The general pipeline of contrastive deep graph clustering
data distribution, while the generator aims to generate confusing methods. At first, the augmented graph is generated via data aug-
embeddings to cheat the discriminator. In these settings, the mentations, and nodes are embedded into the latent space by the
generation and discrimination operations provide effective self- encoders. Subsequently, the networks are guided to pull together
supervision signals, guiding the network to learn high-quality positive samples and push away negative samples. Finally, the
embeddings. Figure 7 demonstrates the general pipeline of the node embeddings from different views are fused and grouped into
adversarial deep graph clustering methods. The core technologies distinct clusters via the clustering method C .
determining the quality of adversarial methods contain generator
designs, discriminator designs, noise generation, and discrimina- learning paradigms. Besides, researchers [41], [66] also veri-
tive loss functions. Several works aim to improve the performance fied the effectiveness of the combination of reconstructive and
of 2022/11/22
adversarial02:00
methods from these aspects.
adversarial.drawio contrastive learning paradigms. Moreover, in AGC-DRR [115],
the researchers show that an adversarial mechanism is a new
Noise
Fake? option for data augmentation in contrastive learning. How to better
Discriminator
Generator

optimize multiple self-supervised tasks and make them cooperate


with each other is another crucial research topic [124], [125].

Real?
3.4 Clustering Method
Input 1 The clustering methods in deep graph clustering aim to separate
2
Graph 1 2 the learned node embeddings into different clusters in a purely
3 3
unsupervised manner. They roughly can be categorized into two
classes: traditional clustering and neural clustering.
Figure 7: The general pipeline of adversarial deep graph clus-
tering. Firstly, the Generator aims to generate high-quality node 3.4.1 Traditional Clustering
embeddings Z from the input graph. Subsequently, the Discrimi- The traditional clustering methods [18], [22], [126], [127], [128],
nator is trained to distinguish the fake information and the learned [129] can be directly performed on the learned node embeddings
features. After the self-supervised training, the clustering method to group them into disjoint clusters in many early deep graph
C separates the learned node embeddings into several clusters. clustering methods [16], [19], [21], [25], [31], [58], [59], [106].
Although they efficiently achieve promising performance, the
traditional clustering methods have two drawbacks as follows:
3.3.3 Contrastive Methods 1) First, the clustering process can not benefit from the strong
The critical idea in contrastive deep graph clustering methods is to representation capability of deep neural networks, thus limiting
improve the discriminativeness of features by pulling together the the discriminative capability of the samples. 2) Second, in these
positive samples while pushing away the negative ones. Thus, con- methods, the node representation learning and the clustering
trastive methods focus on the inter-data information and construct learning cannot be jointly optimized in an end-to-end manner,
the self-supervision signals via meaningful relationships between therefore leading to sub-optimal performance. 3) These methods
samples. The general pipeline of contrastive methods can be found are not easy to adopt batch training and batch inference techniques,
in Figure 8. The main techniques in contrastive methods include limiting the model scalability on large graphs.
graph data augmentations, siamese encoder designs, positive and
negative sample pair construction, negative sampling, counter- 3.4.2 Neural Clustering
active learning loss functions, etc. These aspects are carefully To alleviate the aforementioned issues of traditional clustering
modified to enhance the discriminative capability of contrastive methods, the neural clustering methods are designed to group
learning methods. the samples into different clusters with deep neural networks.
Concretely, in the neural clustering methods, the clustering process
3.3.4 Hybrid Methods and the deep neural network are jointly optimized by the gradient
In about:blank
recent years, some papers have demonstrated the effectiveness descent
1/1
algorithms [130], [131].
of combining different learning paradigms. For example, in ARGA For example, in many two-stage neural clustering methods
[25], Pan et al. adopt both the reconstructive and adversarial [28], [33], [34], [39], [40], [62], [66], [96], [111], a widely-used
2022/11/22 14:14 kl_loss.drawio 2022/11/22 15:55 pseudo_label.drawio

Step1: Step1:
Pre-train in self-supervised manner Pre-train in self-supervised manner

Input Graph
Input Graph

High Confidence
pseudo labels

1 1
Step2: Step2: Train with 3 2
Fine-tune supervision signal

Figure 9: The clustering distribution alignment loss LKL in the


neural clustering methods. In the first step, the graph encoder Figure 10: The clustering pseudo label technology in the neural
F is trained in a self-supervised manner. Then, the clustering clustering methods. In the first step, the encoder F is trained by
method obtains the initial clustering distribution Q. In the sec- the self-supervised pre-text tasks to obtain the discriminative node
ond step, the model sharpens the clustering distribution Q and embeddings Z. Then the clustering method C is performed on
gains the sharpened distribution P, which represents a clustering the learned node embeddings. In the second step, high-confidence
distribution with more confidence. The model will be fine-tuned samples are selected as the pseudo labels. These pseudo labels
by aligning the original distribution and sharpened distribution via serve as the supervision signal to further fine-tune the encoder F .
the KL divergence loss function.
shown in Figure 10. For instance, in GDCL, [62] debias the false
clustering distribution alignment loss is introduced to network negative sampling of contrastive learning with the pseudo clus-
optimization as shown in Figure 9. Specifically, in the first stage, tering labels. Besides, [104] proposes SCAGC to maximize the
these methods pre-train the encoders F in a self-supervised similarities of intra-cluster nodes while minimizing the similarities
manner. After that, they initialize the cluster center embeddings of inter-cluster nodes. Moreover, in IDCRN, [67] utilizes the high
C ∈ Rk×d by the traditional clustering algorithms on the learned confidence pseudo labels to refine the adjacency matrix and guide
node embeddings Z ∈ Rn×d , where n, k , and d denotes the node the learned embeddings to recover the affinity matrix even across
number, cluster number, latent feature dimension, respectively. views. In addition, [41] design a cross-entropy based objective
Note that the cluster center embeddings C are set to the learnable function to guide the network to classify the clustering pseudo
parameters of the deep neural networks, i.e., the tensors with labels in the SGCMC model. Furthermore, [96] collaborates the
gradients. In the second stage, a soft assignment between the node pseudo node classification task with clustering and augments a
embeddings Z and the cluster center embeddings C is calculated pseudo-label set to further improve the clustering performance.
asabout:blank
formulated: Although previous methods achieve promising performance,
1/1about:blank 1/1
the scalability of the KL divergence loss and pseudo-label promot-
− α+1 ing is limited. We analyze the complexity of the KL divergence
(1 + ∥zi − cj ∥2 /α) 2
Qij = P . (15) loss in Eq. (16) as follows. This KL divergence loss function
2 − α+1
j ′ (1 + ∥zi − cj ∥ /α)
2
optimizes the clustering distribution with the whole data, therefore
Here, the similarity between i-th node embedding zi and j - leading to high time and space costs. The time complexity and
th cluster center embedding cj is measured by the Student’s space complexity of calculating the KL divergence loss is O(nkd)
t-distribution kernel. α is the freedom degree of Student’s t- and O(nk + nd + kd), where n denotes the node number, k
distribution. Qij can be considered as the probability of assigning denotes the cluster number and d denotes the dimension number of
i-th node to j -th cluster, namely a soft assignment. Subsequently, latent features. On the large graphs, the node number n becomes
the clustering distribution is promoted by aligning the soft assign- very large, e.g., 111 million nodes on ogbn-papers100M [132].
ment Qij with the high confidence assignments Pij as follows. Therefore the KL divergence loss easily leads to the out-of-
memory error or the long running time problem. To alleviate these
XX Pij problems, Dink-Net [71] is proposed by the idea of cluster dilation
LKL = KL(P ||Q) = Pij log( ), (16)
i j
Qij and cluster shrink. Concretely, the graph encoders are first pre-
trained with a discriminative pre-text task. Then the cluster centers
where the high confidence (target) assignments Pij can be calcu-
are initialized on the learned node embeddings in the k -means++
lated by first raising Qij to the second pow and then normalizing
manner [18]. Note that these cluster centers are assigned as the
by the frequency per cluster as formulated:
tensors with gradient, thus can be optimized with gradient descent
Q2ij / i Qij
P
algorithms [130], [131]. Furthermore, the clustering distribution
Pij = P 2
P . (17) is optimized by minimizing the cluster dilation and cluster shrink
j ′ Qij ′ / i Qij ′
loss functions as follows.
Here, the distribution P is considered the sharpened clustering
distribution, which contains more high-confidence clustering in- Ldilation =
formation. By aligning the original clustering distribution Q and −1 X X
k k
(18)
2
sharpened clustering distribution P, the clusters are iteratively ∥ci − cj ∥ ,
(k − 1)k i=1 j=1,j̸=i
refined, therefore improving clustering performance.
Similarly, in other two-stage neural clustering methods [41], b k
[62], [67], [96], [104], the high confidence clustering pseudo 1 XX 2
Lshrink = ∥zi − cj ∥ , (19)
labels are adopted as the supervision signal for better clustering as bk i=1 j=1
10

Table 4: The statistical information of eleven attribute graph shown in Figure 3, the main challenges in deep graph clustering
datasets. “# Nodes” denotes the number of nodes, “# Edges” contain five aspects, i.e., graph data quality, stability, scalability,
denotes the number of edges, “# Feature Dims” denotes the discriminative capability, and unknown cluster number. In the
feature dimension number, and “# Classes” denotes the number following five sub-sections, we will analyze these challenges via
of categories. experiments and provide potential solutions and opportunities.
Dataset # Nodes # Edges # Feature Dims # Classes
4.1 Graph Data Quality
BAT 131 81 1,038 4
EAT 399 203 5,994 4 The existing conventional deep graph clustering methods always
UAT 1,190 239 13,599 4 are under the assumption that the input graphs are perfectly
Cora 2,708 5,278 1,433 7
CoraFull 19,793 63,421 8,710 70
correct, namely, the connections between nodes are complete and
CiteSeer 3,327 4,614 3,703 6 correct and the node information is also complete and precise.
ACM 3,025 1,870 13,128 3 However, these assumptions do not always hold, especially in the
DBLP 4,057 334 3,528 4
Amazon-Photo 7,650 119,081 745 8 industrial scenario. The real-world graph data are always of low
ogbn-arxiv 169,343 1,166,243 128 40 quality. The noises come from two aspects, i.e., nodes and edges.
Reddit 232,965 23,213,838 602 41
ogbn-products 2,449,029 61,859,140 100 47 Firstly, for the nodes in graphs, the node attributes may contain
ogbn-papers100M 111,059,956 1,615,685,872 128 172 error information or incomplete information. To be specific, for
the error information, they are caused by error records and easily
spreads in the whole graph during the message-passing process.
LDink-Net = Ldilation + Lshrink , (20) In addition, incomplete information contains two types, i.e., partly
attribute missing and completely attribute missing. The latter one
where the first term Ldilation denotes the cluster dilation loss, and is more difficult to process. Secondly, for the edges in graphs,
the second term Lshrink denotes the cluster shrink loss. Besides, they may also contain the error information and incomplete
Z ∈ Rn×d , C ∈ Rk,d , n, k, b, d denotes the node embeddings, information. Concretely, the error information denotes the error
cluster center embeddings, node number, cluster number, and connections between nodes, while the incomplete information
hidden feature dimension. In Eq. 18, the cluster dilation loss aims denotes the missing edges, which should exist in the correct graph.
to learn the network to push away different cluster centers. In Besides, the conventional graph convolutional network encoders
addition, in Eq. 19, the cluster shrink loss aims to pull the samples [2], [134] are based on the homophily assumption [135], i.e., the
to the cluster centers. Note that the cluster shrink loss pulls batch connected nodes contain similar semantics. However, in the real-
samples to the cluster centers and easily scales to large graphs. world scenario, the connected nodes may not have similar features.
Besides, each sample will be pulled to all cluster centers rather For example, in the heterophily graphs [136] such as fraudulent
than the corresponding nearest cluster centers. The authors claim networks, the hackers and regular users will construct dense
this design alleviates the confirmation bias problem [133]. The connections, but they do not share similar underlying semantics.
time complexity and space complexity of calculating LDink-Net is To verify this challenge in deep graph clustering, we conduct
O(bkd + k 2 d + bd) and O(bk + bd + kd), respectively. experiments on one of the representative state-of-the-art methods,
Although verified effective, the two-stage neural clustering HSAN [68], on five datasets, including Cora, CiteSeer, BAT, EAT,
methods heavily depend on the excellent initialization of cluster and UAT. The detailed information of the datasets can be found in
centers. And the precise cluster center initialization relies on the Table 4. The experimental results are demonstrated in Table 5 and
promising network pre-training and the auxiliary of traditional Table 6.
clustering methods like k -Means, leading to high training and In Table 5, we evaluate the clustering performance of HSAN
deployment costs. To overcome this shortcoming, many one- [68] with missing data. The missing information contains node
stage methods [64], [97], [115] have been gradually proposed attributes and edges. And the drop rates are set to 10%, 30%, 50%,
recently. For example, [64] proposes a new method termed SCGC 70%, and 90%, respectively. From the experimental results, three
with Laplacian smoothing and neighborhood-orient contrastive conclusions are drawn. Firstly, when the information on graphs
learning. Besides, in AGC-DRR [115], the clustering sub-network is missing, the deep graph clustering method can not achieve
is developed to directly predict the probabilities of cluster-ID promising performance. Secondly, the clustering performance
for each sample. Moreover, rather than dealing with clustering drops when the drop rate increases. Thirdly, the node attributes
as a downstream task, [97] proposes a unified framework termed are more critical to the clustering performance on some citation
GCC by minimizing the difference between node embeddings and graphs, such as the CiteSeer dataset, while the edges are more
reconstructed cluster embeddings. The one-stage neural clustering critical to the clustering performance on some airport activity
will be a promising future direction. In the next section, we graphs, such as the BAT dataset.
analyze and summarize the technical challenges and opportunities In addition, in Table 6, the performance of HSAN [68] with
in the deep graph clustering field in detail. noisy information is tested. Similarly, the noised information
contains the noised node attribute and noised edges. Concretely,
for the node attributes, we add the Gaussian noises to them, and
4 C HALLENGE & O PPORTUNITY the standard deviations are set to 0.01, 0.1, 1, and 10, respectively.
In recent years, we have witnessed the fast growth of deep Besides, for the edges in the graph, we add the Gaussian noises
graph clustering. More and more methods have been proposed to the adjacency matrix, and the standard deviations are set to
and achieved promising performance. However, most methods are 0.01, 0.1, 1, and 10, respectively. We have two conclusions from
under some perfect assumption, and there are still many challenges the experimental results. Firstly, the noises in attributes and edges
in the field of deep graph clustering. From this motivation, this limit the clustering performance. Secondly, when the noise rate
section aims to summarize the main technical challenges. As increases, the performance drops significantly.
11

Table 5: Graph data quality testing experiments. Clustering performance of HSAN [68] on five datasets with missing information
including dropped node attributes and dropped edges. Experimental results are the mean value of ten runs.
Attribute Drop Rate Edge Drop Rate
Origin
10% 30% 50% 70% 90% 10% 30% 50% 70% 90%
ACC 77.73 75.22 74.00 68.07 63.74 46.16 77.13 75.00 71.53 62.96 48.81
NMI 59.45 57.93 54.73 48.26 40.62 21.21 59.55 56.74 51.69 42.54 29.54
Cora
ARI 57.62 54.76 51.72 42.05 36.01 16.42 58.03 53.81 47.31 38.10 18.98
F1 76.16 73.70 72.78 67.72 62.91 45.39 74.97 72.09 68.48 59.34 47.12
ACC 71.36 69.64 64.52 62.26 51.21 36.13 70.83 67.66 62.48 53.64 55.06
NMI 45.03 42.01 35.56 32.33 19.83 8.66 44.53 42.12 35.30 26.80 27.50
CiteSeer
ARI 46.81 43.94 36.19 32.23 19.74 8.02 45.95 38.75 31.38 22.38 23.50
F1 63.63 62.25 56.98 54.09 45.08 28.75 63.44 59.53 56.06 49.59 48.84
ACC 77.86 77.61 75.32 74.30 76.08 70.74 65.90 56.74 49.87 48.35 51.15
NMI 53.00 53.34 50.19 49.40 50.96 45.29 41.70 30.36 22.33 24.82 22.75
BAT
ARI 51.16 51.26 47.42 46.25 48.61 42.26 36.32 23.94 15.40 15.42 17.09
F1 77.80 77.48 75.15 74.02 75.85 69.40 62.71 53.10 45.43 43.00 50.50
ACC 57.73 57.73 58.15 58.06 57.39 55.97 55.22 52.88 47.70 44.78 45.36
NMI 34.12 34.35 34.86 33.83 33.00 31.73 33.10 29.38 22.26 16.79 17.88
EAT
ARI 27.22 27.37 28.17 27.33 26.99 25.58 26.16 22.89 15.57 11.40 13.69
F1 58.09 57.98 58.31 58.21 57.20 55.85 51.02 47.87 45.79 44.43 43.42
ACC 56.16 56.89 55.18 55.29 54.45 55.77 52.55 46.69 42.86 42.94 47.90
NMI 26.89 26.60 25.51 23.55 23.77 24.68 20.30 14.58 13.57 15.04 19.06
UAT
ARI 25.21 24.49 23.11 23.23 22.77 24.79 19.40 13.82 09.63 11.98 16.28
F1 55.99 54.33 51.46 53.86 51.91 55.05 49.60 43.53 40.65 38.13 44.13

Table 6: Graph data quality testing experiments. Clustering performance of HSAN [68] on five datasets with noisy information including
noised node attributes and noised edges. Experimental results are the mean value of ten runs.

Standard Deviation of Attribute Noises Standard Deviation of Edge Noises


Origin
0.01 0.1 1 10 0.01 0.1 1 10
ACC 77.73 77.31 76.53 46.82 36.15 37.20 21.25 30.24 30.24
NMI 59.45 59.04 58.37 22.18 13.53 11.16 0.51 0.40 0.40
Cora
ARI 57.62 59.16 56.43 17.44 8.95, 8.53 0.48 0.02 0.02
F1 76.16 74.76 74.76 46.34 37.19 34.4 15.61 6.84 6.84
ACC 71.36 70.39 68.79 37.47 27.35 28.27 21.57 21.07 21.07
NMI 45.03 44.54 41.85 8.44 3.52 3.70 0.35 0.29 0.29
CiteSeer
ARI 46.81 46.16 43.64 8.06 1.55 2.21 0.20 0.00 0.00
F1 63.63 64.12 62.40 30.49 17.70 21.85 15.53 5.91 5.91
ACC 77.86 78.37 78.37 66.41 58.52 66.41 52.67 27.48 27.48
NMI 53.00 54.15 53.07 46.50 40.53 38.86 23.62 4.36 4.36
BAT
ARI 51.16 52.27 52.33 42.14 31.70 34.02 18.13 0.15, 0.15
F1 77.80 78.30 78.19 62.76 56.32 66.12 52.45 12.25 12.25
ACC 57.73 57.98 56.64 53.22 50.71 55.81 50.71 26.07 26.07
NMI 34.12 34.36 33.59 32.13 32.17 31.58 23.27 1.46 1.46
EAT
ARI 27.22 28.06 25.92 25.64 23.73 24.07 17.30 0.01 0.01
F1 58.09 58.11 57.12 48.54 45.92 56.37 51.62 11.24 11.24
ACC 56.16 55.57 56.97 53.39 54.17 47.00 40.25 25.38 25.38
NMI 26.89 25.59 26.23 24.33 24.88 15.72 9.26 0.50 0.50
UAT
ARI 25.21 25.41 25.60 22.02 23.88 15.41 9.10 0.00 0.00
F1 55.99 54.86 53.64 47.29 52.79 45.42 36.97 10.56 10.56

Next, we discuss the potential solutions. Firstly, for the in- clustering models is relatively weak. For example, for the classical
complete node attributes or edges, the imputation networks [137], clustering method such as k -Means [18], its promising perfor-
[138] can help alleviate the information missing issue. Besides, mance highly relies on the quality of the initial cluster centers.
for the error information, the denoising technologies [139], [140], Similarly, the performance of deep clustering techniques [29], [30]
[141] might effectively remove the noises in the node attributes is also sensitive to the initialization of the neural networks and
and edges. Additionally, to solve the homophily and heterophily trainable cluster centers. Next, we systematically analyze the ran-
problems, researchers propose various methods [74], [136]. domness in the deep graph clustering methods. It mainly consists
of two parts as follows. Firstly, one of the key factors in achieving
promising clustering performance is the excellent representation
4.2 Stability
capability of the learned node embeddings. Thus, the initialization
The stability of deep graph clustering algorithms is essential, es- and training processes of the embedding networks influence the
pecially in sensitive domains such as financial risk control, social stability of deep graph clustering methods. Secondly, a clustering
network anomaly detection, etc. Different from the supervised method with strong discriminative capability is another critical
methods or the semi-supervised methods, deep graph clustering factor. Therefore, initialization and optimization processes of the
methods group the nodes into disjoint clusters in a purely unsu- neural cluster centers also easily affect the stability of deep graph
pervised manner, i.e., without any human annotations. Therefore, clustering methods.
without the guidance of ground truth, the stability of deep graph
12

Table 7: Stability testing of deep graph clustering methods. Average clustering performance of eleven methods on six datasets. The
experimental results are obtained with ten runs and presented with standard deviation.

Dataset Metric k-Means AE DEC IDEC GAE DAEGC ARGA SDCN MVGRL DFCN DCRN
ACC 0.65 0.35 0.56 0.62 1.22 0.48 0.59 1.81 1.02 0.80 0.25
NMI 0.38 0.16 0.28 0.50 0.91 0.45 0.92 1.34 0.63 1.00 0.44
DBLP
ARI 0.39 0.43 0.39 0.60 1.40 0.52 0.91 2.01 0.50 1.50 0.46
F1 0.27 0.36 0.51 0.56 2.23 0.67 0.66 1.51 1.51 0.80 0.26
ACC 3.17 0.13 0.20 1.42 0.80 1.39 0.49 0.31 0.36 0.20 0.18
NMI 3.22 0.08 0.30 2.40 0.65 0.86 0.71 0.32 0.40 0.20 0.35
CiteSeer
ARI 3.02 0.14 0.36 2.65 1.18 1.24 0.70 0.43 0.73 0.30 0.30
F1 3.53 0.11 0.17 1.39 0.82 1.32 0.31 0.24 0.39 0.20 0.21
ACC 0.71 0.08 0.76 0.52 1.44 2.83 0.36 0.18 0.76 0.20 0.20
NMI 0.46 0.16 1.51 1.16 1.92 4.15 0.82 0.25 1.40 0.40 0.61
ACM
ARI 0.69 0.16 1.87 1.50 3.10 3.89 0.86 0.40 1.76 0.40 0.52
F1 0.74 0.08 0.74 0.48 1.33 2.79 0.35 0.19 0.72 0.20 0.20
ACC 0.76 0.08 0.08 0.08 2.48 0.01 2.30 0.81 1.79 0.80 0.13
NMI 1.33 0.30 0.05 0.08 2.79 0.03 2.76 0.83 1.31 1.00 0.24
Amazon-Photo
ARI 0.44 0.47 0.04 0.07 4.57 0.02 4.41 1.23 0.47 0.84 0.20
F1 0.51 0.20 0.12 0.11 1.76 0.02 1.95 1.49 2.39 0.31 0.12
ACC 0.01 0.31 0.09 0.34 0.81 0.03 0.12 1.30 0.52 0.07 0.07
NMI 0.02 0.57 0.14 0.29 3.54 0.03 0.17 2.04 1.45 0.13 0.08
PubMed
ARI 0.01 0.67 0.13 0.39 1.39 0.04 0.17 2.07 1.06 0.11 0.12
F1 0.01 0.29 0.10 0.32 0.85 0.02 0.13 1.21 0.36 0.07 0.08
ACC 1.10 0.19 0.45 0.31 0.81 1.00 0.43 0.40 2.95 0.81 0.60
NMI 0.84 0.25 0.24 0.28 0.75 0.73 0.25 0.39 3.95 0.41 0.35
CoraFULL
ARI 0.57 0.27 0.29 0.22 0.86 0.47 0.24 0.27 2.63 0.48 0.49
F1 1.09 0.30 0.58 0.41 0.75 1.33 0.41 0.43 2.87 0.87 0.76

We conduct experiments to test the stability of existing Secondly, the clustering algorithms need to group a large number
deep graph clustering methods on six datasets, including DBLP, of nodes into separate clusters.
CiteSeer, ACM, Amazon-Photo, PubMed, and CoraFULL. The For the first part, batch training [145] will be a good solution.
compared methods include DCRN [66], DFCN [34], MVGRL Concretely, researchers can partition the large graph into a mini-
[59], DAEGC [28], , SDCN [33], ARGA [25], GAE [4], DEC batch graph via sub-graph extraction technology like random walk
[29], IDEC [30], AE [142], k -Means [18]. All experimental [146] and then train the networks on the sub-graphs. Here, unlike
results are obtained with ten runs. And the results are reported images or texts, one of the essential characteristics of graph data
with standard deviation. The experimental results indicate that is the relational connections between nodes. Thus, keeping the
the stability of recent methods seems good. However, there are original connection information after the graph partition is vital
various problems with their implementation. Firstly, the classical for the model’s performance.
method k -Means initializes the cluster centers various times to find For the second part, given the learned node embeddings
excellent initialization. Secondly, most of the existing methods Z ∈ Rn×d , the classical k -Means clustering algorithm will take
such as DEC [29], IDEC [30], DAEGC [28], ARGA [25], SDCN O(tknd) time complexity and O(nd + kd) space complexity.
[33], MVGRL [59], DFCN [34], and DCRN [66] rely on the Here, n, k , and d are the node number, cluster number, and feature
well pre-trained node embeddings and cluster center embeddings. dimension number, respectively. Besides, t is the iteration number
And the manual trial processes of network pre-training and center of k -Means. On the large graph, the node number n is very large,
initialization seem to increase the stability of deep graph clustering such as 111M node number on ogbn-paper100M dataset [132],
methods. However, in the real-world scenario, implementing these easily leading to out-of-memory on GPU or the long calculation
procedures is costly. Thus, we consider that the stability of the time on CPU.
existing deep graph clustering methods is overestimated. It will be
To demonstrate this challenge in deep graph clustering meth-
a promising direction to improve the robustness and stability of
ods, we conduct experiments and theoretical analyses including
deep graph clustering methods.
comparison experiments, running cost evaluation, and complexity
In the existing studies, researchers conduct ten runs with
analyses. As shown in Table 8. Concretely, fifteen important
different random seeds to alleviate the randomness influence on
baselines are tested on four large graph datasets, including ogbn-
the experimental results. We consider that some optimization
arXiv, ogbn-products, Reddit, and ogbn-papers100M. The detailed
strategies [143], [144] might enhance the stability of deep graph
statistics of datasets are summarized in Table 4. The baselines
clustering.
contain k -Means [18], DEC [29], DCN [147], node2vec [148],
DGI [51], AGE [58], MinCutPool [35], MVGRL [59], BGRL
[55], GRACE [52], ProGCL [149], AGC-DRR [115], DCRN [66],
4.3 Scalability
S3 GC [70], and Dink-Net [71]. From these experimental results,
In this section, we discuss the scalability of deep graph clustering three conclusions are obtained as follows. Firstly, the classical
methods. Although achieving promising performance, most pre- method k -Means can scale to large graphs with million of nodes
vious deep graph clustering methods can not scale to large-scale but can not achieve promising performance since the represen-
graph datasets, such as ogbn-papers100M [132]. The heavy time tation learning capability is limited. Secondly, most of the deep
and memory costs mainly come from two parts, as follows. Firstly, graph clustering methods like MVGRL [59], AGE [58], GRACE
the graph encoders need to be trained on large-scale graphs. [52], and DCRN [66] fail to run on large graphs. The reasons
13

Table 8: Scalability testing experiments on four large graphs. “OOM” indicates that method raises the out-of-memory failure.

Dataset Metric k-Means DEC DCN node2vec DGI AGE MVGRL BGRL GRACE ProGCL AGC-DRR DCRN S3 GC Dink-Net

ACC 18.11 21.25 19.91 29.00 31.40 22.70 29.86 35.00 43.68
NMI 22.13 25.14 23.81 40.60 41.20 32.10 37.51 46.30 43.73
ogbn-arXiv OOM OOM OOM OOM OOM
ARI 7.43 10.28 8.25 19.00 22.30 13.00 25.74 27.00 35.22
F1 12.94 15.57 13.06 22.00 23.00 16.60 21.79 23.00 26.92
ACC 18.11 23.79 24.50 35.70 32.00 35.21 40.20 41.09
ogbn- NMI 22.13 24.47 21.92 48.90 46.70 46.59 53.60 50.78
OOM OOM OOM OOM OOM OOM
products ARI 7.43 9.05 10.96 17.00 17.40 19.87 23.00 21.08
F1 12.94 13.54 13.95 24.70 19.20 21.55 25.00 25.15
ACC 8.90 70.90 22.40 65.41 73.60 76.03
NMI 11.40 79.20 30.60 70.48 80.70 78.91
Reddit OOM OOM OOM OOM OOM OOM OOM OOM
ARI 2.90 64.00 17.00 63.42 74.50 71.34
F1 6.80 55.10 18.30 51.45 56.00 67.95
ACC 14.60 17.50 15.10 17.30 26.67
ogbn- NMI 37.33 38.00 41.60 45.30 54.92
OOM OOM OOM OOM OOM OOM OOM OOM OOM
papers100M ARI 7.54 11.20 9.60 11.00 18.01
F1 10.45 11.10 11.10 11.80 19.48

Table 9: Efficiency analyses of eight deep graph clustering methods. The left two columns denote the time and space complexity
analyses. The right two columns denote the GPU and time costs. Experimental results obtained on the Cora dataset. n, b, k , l, g , d′ ,
and d denote node number, batch size, cluster number, edge number, average degree of graph, attribute dimension, and latent feature
dimension.

Method Time Complexity (per iteration) Space Complexity GPU Memory Cost (MB) Time Cost (s)
DEC O(nkd) O(nk + nd + kd) 1294 14.59
node2vec O(bd) O(nd) - 111.03
DGI O(ld′ + nd2 ) O(l + nd + d2 ) 3798 19.03
MVGRL O(n2 d + nd2 ) O(n2 + nd + d2 ) 9466 168.20
GRACE O(n2 d + ld + d2 ) O(l + nd) 1292 44.77
BGRL O(ld + nd2 ) O(l + nd + d2 ) 1258 44.18
S3 GC O(ngd2 ) O(nd + bgd + d2 ) 1474 508.21
Dink-Net O(nkd + k2 d + bd) O(bk + bd + kd) 1248 35.09

are that they either process the graph diffusion matrix or do not learning. [66] observe the representation collapse problem, i.e.,
adopt batch training techniques. Thirdly, the scalable deep graph the encoder tends to embed all samples into the same cluster in
clustering methods S3 GC [70] and Dink-Net [71] can run on large deep graph clustering methods. To demonstrate this problem, we
graphs like ogbn-papers100M [132] with 111 million nodes and conduct extensive experiments, including three types: performance
achieve promising performance. S3 GC [70] conducts contrastive comparison experiments, similarity visualization experiments, and
learning on the graph and performs k -Means on the learned node embedding visualization experiments.
embeddings. In addition, Dink-Net [71] unifies the representation At first, as shown in Table 10, we conduct performance
learning and clustering optimization via cluster dilation and cluster comparison experiments on six datasets, including DBLP, Cite-
shrink loss functions. Besides, in Table 9, we analyze the time Seer, ACM, Amazon-Photo, PubMed, and CoraFull datasets. The
complexity and space complexity, and test the GPU memory costs detailed information of these datasets is listed in Table 4. We com-
and time costs of six deep graph clustering methods, including pare thirteen state-of-the-art methods including DEC [29], GAE
DEC [29], node2vec [148], DGI [51], MVGRL [59], BGRL [55], [4], DAEGC [28], ARGA [25], SDCN [33], DFCN [34], MVGRL
S3 GC [70], and Dink-Net [71]. [59], MCGC [60], SCAGC [104], SCGC [64], AGC-DDR [115],
Next, we discuss the potential techniques to improve the DCRN [66], and IDCRN [67]. From these experimental results,
scalability of deep graph clustering methods. For the first part, i.e., we have three observations as follows. Firstly, the performance
encoder training on the large-scale graph, the possible techniques of deep clustering methods such as DEC [29] is limited since
include the better sub-graph extraction method, more efficient it omits the graph structure of graph data. Secondly, various deep
batch training strategy, etc. For the second part, i.e., node clus- reconstructive graph clustering methods such as GAE [4], DAEGC
tering on the large-scale graph, the potential solutions contain the [28], SDCN [33], DFCN [34] significantly improve the clustering
mini-batch k -Means [150], finding cluster centers via the neural performance since they reconstruct the graph information and
network [71], calculating the clustering assignments in a mini- improve the discriminative capability of networks. Thirdly, the
batch manner [71], and bi-part graph clustering [151], [152], etc. recent contrastive methods like MVGRL [59], SCGC [64], and
DCRN [66] further enhance the sample discriminative capability
4.4 Discriminative Capability via pulling together positive sample pairs and pushing away the
One key factor of promising clustering performance is to learn negative sample pairs, achieving the most promising performance.
discriminative node embeddings, which help the clustering meth- In addition, to intuitively demonstrate this challenge, we con-
ods better group nodes into different clusters. Therefore, the duct the t-SNE visualization on the learned node embeddings. As
discriminative capability of the network is vital in self-supervised shown in Figure 11, Three observations can be found as follows.
14

Table 10: Discriminative capability testing experiments. Clustering performance of thirteen methods on six datasets

Dataset Metric DEC GAE DAEGC ARGA SDCN DFCN MVGRL MCGC SCAGC SCGC AGC-DDR DCRN IDCRN
ACC 58.16 61.21 62.05 64.83 68.05 76.00 42.73 58.92 79.42 67.25 80.41 79.66 82.08
NMI 29.51 30.80 32.49 29.42 39.50 43.70 15.41 33.69 49.05 38.64 49.77 48.95 52.70
DBLP
ARI 23.92 22.02 21.03 27.99 39.15 47.00 8.22 25.97 54.04 35.95 55.39 53.60 58.81
F1 59.38 61.41 61.75 64.97 67.71 75.70 40.52 50.39 78.88 67.06 79.90 79.28 81.47
ACC 55.89 61.35 64.54 61.07 65.96 69.50 68.66 64.76 61.16 71.02 68.32 70.86 71.40
NMI 28.34 34.63 36.41 34.40 38.71 43.90 43.66 39.11 32.83 45.25 43.28 45.86 46.77
CiteSeer
ARI 28.12 33.55 37.78 34.32 40.17 45.50 44.27 37.54 31.17 46.29 45.34 47.64 48.67
F1 52.62 57.36 62.20 58.23 63.62 64.30 63.71 59.64 56.82 62.24 64.82 65.83 66.27
ACC 84.33 84.52 86.94 86.29 90.45 90.90 86.73 91.64 91.83 89.80 92.55 91.93 92.58
NMI 54.54 55.38 56.18 56.21 68.31 69.40 60.87 70.71 71.28 66.60 72.89 71.56 73.17
ACM
ARI 60.64 59.46 59.35 63.37 73.91 74.90 65.07 76.63 77.29 72.37 79.08 77.56 79.18
F1 84.51 84.65 87.07 86.31 90.42 90.80 86.85 91.70 91.84 89.74 92.55 91.94 92.60
ACC 47.22 71.57 76.44 69.28 53.44 76.88 45.19 OOM 75.25 77.48 78.11 79.94 80.17
NMI 37.35 62.13 65.57 58.36 44.85 69.21 36.89 OOM 67.18 67.67 72.21 73.70 74.32
Amazon-Photo
ARI 18.59 48.82 59.39 44.18 31.21 58.98 18.79 OOM 56.86 58.48 61.15 63.69 64.10
F1 46.71 68.08 69.97 64.30 50.66 71.58 39.65 OOM 72.77 72.22 72.72 73.82 74.01
ACC 60.14 62.09 68.73 65.26 64.20 68.89 67.01 60.97 OOM 63.14 - 69.87 70.02
NMI 22.44 23.84 28.26 24.80 22.87 31.43 31.59 33.39 OOM 27.77 - 32.20 33.29
PubMed
ARI 19.55 20.62 29.84 24.35 22.30 30.64 29.42 29.25 OOM 24.73 - 31.41 32.67
F1 61.49 61.37 68.23 65.69 65.01 68.10 67.07 59.84 OOM 62.27 - 68.94 69.19
ACC 31.92 29.60 34.35 22.07 26.67 37.51 31.52 29.08 OOM OOM - 38.80 39.45
NMI 41.67 45.82 49.16 41.28 37.38 51.30 48.99 36.86 OOM OOM - 51.91 52.83
CoraFULL
ARI 16.98 17.84 22.60 12.38 13.63 24.46 19.11 13.15 OOM OOM - 25.25 25.97
F1 27.71 25.95 26.96 18.85 22.14 31.22 26.51 22.90 OOM OOM - 31.68 32.58

(a) Raw Data (b) DEC (c) GAE (d) ARGA (e) DFCN (f) IDCRN
Figure 11: Discriminative capability testing experiments. t-SNE visualization of node embeddings. The compared methods include
DEC [29], GAE [4], ARGA [25], DFCN [34], and IDCRN [67]. The first and second row is the result on ACM and DBLP dataset.

tering distribution. Thirdly, the sample discriminative capability


of the contrastive methods IDCRN [67] is strong, revealing the
clustering distribution well. We consider visualizing the learned
node embeddings as one of the intuitive ways to evaluate the
discriminative capability of deep graph clustering methods.
Furthermore, we calculate and visualize the pair-wise similar-
ity of the learned node embeddings. The results are demonstrated
in Figure 12. Here, we reorder the nodes and gather the nodes with
the same labels on the same side. Besides, the red pixel denotes
(a) GAE (b) MVGRL (c)SDCN (d) IDCRN the high similarity, and the blue pixel denotes the low similarity.
From the figure, two conclusions are drawn. Firstly, GAE [4] and
Figure 12: Discriminative capability testing experiments. Visual- MVGRL [59] have a trend to the representation collapse problem
ization of pairwise sample similarity. The compared methods are since all paired sample similarities are high. Secondly, SDCN [33]
GAE [4], MVGRL [59], SDCN [33], and IDCRN [67]. The first and IDCRN [67] alleviate the representation collapse problem via
row and second row is the result on ACM and CiteSeer dataset. the delivery operator and dual correlation reduction. As shown
in Figure 12 (c) and Figure 12 (d), the similarities between the
samples in the same cluster are relatively high. In contrast, the
Firstly, in the raw data, we can not observe any cluster shape in similarities between the samples in different clusters are relatively
the latent space. Secondly, the deep clustering method DEC [29], low. We think this similarity visualization method can be an
reconstructive deep graph clustering [4], [34], and adversarial deep excellent tool to test the sample discriminative capability of the
graph clustering method ARGA [25] can reveal part of the clus- deep graph clustering method.
15

Recently, we have witnessed the fast development of self- To solve this problem, one potential solution is to perform
supervised learning on graphs [153]. The reconstructive deep clustering based on density [154]. Concretely, the clustering net-
graph clustering methods [4], [28], [33], [34], [39] reconstruct work should be designed to assign the high-density data area as
the graph information such as node attribute and graph structure. the cluster centers and regard the low-density data area as the
Besides, the adversarial methods [25], [106], [115] generate the decision borders. Besides, another potential solution technology
adversarial samples and discriminate them from the real samples. is deep reinforcement learning [86], [155]. To be specific, in
In addition, the contrastive methods [66], [71], [104], [115] pull an unsupervised scenario, recognizing the cluster number in the
together the positive sample pairs while pushing away the negative graph data can be modeled as the Markov decision process and be
sample pairs. Previous researchers designed different promising handled with deep reinforcement neural networks. More cluster
pre-text tasks to enhance the sample discriminative capability number determination methods refer to [156]. In the next section,
of networks. In the future, to further enhance the discriminative we will introduce the applications of deep graph clustering in four
capability, the hard sample mining strategies [68], [149] will be domains.
an exciting future direction.
5 A PPLICATION & O PEN R ESOURCE
4.5 Unknown Cluster Number 5.1 Application
Most of the existing deep graph clustering methods consider the In recent years, we have witnessed the fast growth of deep graph
cluster number as a given value, which is usually the same as the clustering. Thanks to the researchers in this field, promising deep
classes of the ground truth. Benefiting from this setting, recent graph clustering methods are increasingly proposed. Benefiting
deep graph clustering methods can achieve promising perfor- from the strong data partitioning capability, deep graph clustering
mance. The remarkable success of recent deep graph clustering has been applied to various real-world application domains, such
algorithms relies on the pre-defined cluster number. However, as natural language processing, computer vision, social network
in most real-world scenarios, the cluster number is an unknown analyses, recommendation systems, bioinformatics, medical sci-
value. For example, in Internet social networks, the users can be ence, etc. The applications are demonstrated in Figure 14.
grouped into several clusters, but we can not know the specific Next, we introduce the applications in detail. In the computer
number of user groups. In addition, the number of user groups vision domain, deep graph clustering methods are applied to face
is the most important in the user grouping algorithm. Therefore, analysis [157], [158], co-saliency detection [159] and video anal-
we think the clustering performance of the existing deep graph yses [160]. Besides, document mining [161], speech separation
clustering methods is overestimated. [162], and large language models [163] are essential applications
of deep graph clustering in the natural language processing area.
In addition, deep graph clustering is crucial for social network
analyses. It can be used to conduct community detection [164],
[165], [166], [167] and anomaly detection [168], [169]. Similarly,
deep graph clustering demonstrates the high application values
in the recommendation systems. To be specific, it can help user
grouping recommendations and user intent extraction [170]. Apart
from social data mining, deep graph clustering is also vital to
bioinformatics and medical science. Concretely, the applications
in the bioinformatics field include molecular mining [171], [172],
(a) DBLP (b) CiteSeer
metagenomic binning [173], single-cell RNA sequencing [43],
Figure 13: Unknown cluster number experiments. Clustering per- etc. Also, in the medical science domain, deep graph clustering
formance of DCRN [66] with different cluster numbers K ∈ methods are adopted in disease analyses [174], medical big data
[2, 10] on Cora and CiteSeer datasets. [175], and medical image [176]. In the future, we hope the
researchers will further solve the challenges and apply deep graph
In order to verify our suspect, we conduct experiments on clustering methods to a broader range and more significant fields.
two datasets, including DBLP and CiteSeer. Concretely, we adopt
DCRN [66] as the baseline and evaluate its clustering performance 5.2 Open Resource
with NMI and ARI metrics using different cluster numbers ranging In order to construct an open, active, and collaborative research
from 2 to 10. The experimental results are demonstrated in Figure & engineering atmosphere in the deep graph clustering field, we
13. We carefully analyze the results the draw two conclusions as make efforts to the open-source projects of deep graph clustering.
follows. Firstly, the baseline DCRN achieves promising and best Concretely, we make a deep graph clustering collection and build
performance when the cluster number equals the class number of a unified framework of deep graph clustering methods. We hope
ground truth. Here, the sample class number of the DBLP dataset these open resources will help the researchers quickly startup,
is 4, and the sample class number of CiteSeer is 6. Secondly, other solve critical problems, and apply deep graph clustering methods
cluster numbers will lead to performance degeneration. When to a boarder range of applications. Next, we introduce these two
the cluster number reduces, many clusters will collapse into one GitHub repositories in detail.
cluster. Besides, when the cluster number increases, the large
cluster will be split into many small clusters. Therefore, the cluster 5.2.1 Awesome Deep Graph Clustering
number is crucial for the clustering performance, and designing Awesome Deep Graph Clustering (ADGC) is a collection
the deep graph clustering methods without the pre-defined cluster of state-of-the-art, novel deep graph clustering methods, in-
number will be a challenging and meaningful direction. cluding papers, codes, and datasets. In addition, this project
16

User Grouping User Interest


Recommendation Extraction
Document Face
Mining Analyses

Recommendation
System Computer Co-saliency
Speech Natural Language
Separation Vision Detection
Processing

Video
Large Langugage Applications of Anlyses
Models Deep Graph Clustering

Community Social Network Metagenomic


Detection Analyses Bioinformatics Binning
Medical Science

Anomaly Molecular
Detection Mining
Single-cell
Disease Medical Medical RNA Sequencing
Analyses Big Data Image

Figure 14: Applications of deep graph clustering in six domains: natural language processing, computer vision, social network analyses,
recommendation systems, bioinformatics, and medical science.

open-sources some common data processing, clustering, met- open resources on the GitHub platform. Among them, Awesome
ric calculation, and visualization functions. The project ad- Deep Graph Clustering is a comprehensive collection of deep
dress can be found at GitHub (https://github.com/yueliu1999/ graph clustering, including papers, codes, and datasets. Besides,
Awesome-Deep-Graph-Clustering). Up to this paper submission A-Unified-Framework-for-Deep-Attribute-Graph-Clustering is a
time, we have collected about 130 papers and codes and about unified framework of deep graph clustering. We hope this work
20 datasets. Besides, this GitHub repository has gained about becomes a quick guide for the researchers and motivates them to
500 stars and 100 forks. We welcome any issues, pull requests, solve significant problems in deep graph clustering.
contributions, discussions, etc.

5.2.2 A Unified Framework of Deep Graph Clustering ACKNOWLEDGMENTS


A-Unified-Framework-for-Deep-Attribute-Graph-Clustering This work was supported by the National Key R&D Pro-
aims to build a unified framework for the deep gram of China (project no. 2020AAA0107100, 2021ZD0140406,
graph clustering methods. The project address can 2022ZD0115100), the National Natural Science Foundation of
be found at GitHub (https://github.com/Marigoldwu/ China (project no, 62325604, 62276271, U21A20427, 62006237,
A-Unified-Framework-for-Deep-Attribute-Graph-Clustering). 61922088, 61906020 and 61773392), the Research Center for
This project redesigns the architecture of the codes of ADGC Industries of the Future (Project WU2022C043), and the Com-
so that helps researchers conduct experiments easily. The tool petitive Research Fund (Project WU2022A009) from the West-
classes and functions are defined to simplify the codes and clarify lake Center for Synthetic Biology and Integrated Bioengineering.
the settings configuration. Besides, we thank Benyu Wu (https://marigoldwu.github.io/) for
his significant contributions to A-Unified-Framework-for-Deep-
6 C ONCLUSION Attribute-Graph-Clustering project.
In this work, we make a comprehensive survey about deep graph
clustering to help researchers review, summarize, and plan for the
future. Concretely, we first demonstrate the general pipeline of R EFERENCES
deep graph clustering and give a formal definition. Besides, we [1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, 2015.
introduce the critical development and evaluation of deep graph [2] T. N. Kipf and M. Welling, “Semi-supervised classification with graph
clustering. Secondly, a structured taxonomy is provided from four convolutional networks,” in Proc. of ICLR, 2017.
[3] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, and
perspectives, including graph type, network architecture, learning Y. Bengio, “Graph attention networks,” in Proc. of ICLR, 2018.
paradigm, and clustering method. Moreover, through extensive [4] T. N. Kipf and M. Welling, “Variational graph auto-encoders,” arXiv
experiments, we analyze and summarize the challenges in this preprint arXiv:1611.07308, 2016.
domain, such as the problems of graph data quality, stability, [5] X. Yang, Y. Liu, S. Zhou, X. Liu, and E. Zhu, “Mixed graph con-
trastive network for semi-supervised node classification,” arXiv preprint
scalability, discriminative capability, and unknown cluster number. arXiv:2206.02796, 2022.
Also, the potential technical solutions are carefully discussed. [6] K. Liang, L. Meng, M. Liu, Y. Liu, W. Tu, S. Wang, S. Zhou, and X. Liu,
Thirdly, we introduce the applications of deep graph clustering in “Learn from relational correlations and periodic events for temporal
knowledge graph reasoning,” in Proc. of SIGIR, 2023.
six domains, including social network analysis, recommendation
[7] K. Liang, L. Meng, S. Zhou, S. Wang, W. Tu, Y. Liu, M. Liu, and
systems, computer vision, natural language processing, bioinfor- X. Liu, “Message intercommunication for inductive relation reasoning,”
matics, and medical science. More importantly, we contribute two arXiv preprint arXiv:2305.14074, 2023.
17

[8] K. Liang, J. Tan, D. Zeng, Y. Huang, X. Huang, and G. Tan, “Abslearn: a [36] A. Tsitsulin, J. Palowitch, B. Perozzi, and E. Müller, “Graph clustering
gnn-based framework for aliasing and buffer-size information retrieval,” with graph neural networks,” Journal of Machine Learning Research,
Pattern Analysis and Applications, 2023. 2023.
[9] S. Zhou, H. Xu, Z. Zheng, J. Chen, J. Bu, J. Wu, X. Wang, W. Zhu, [37] Z. Ying, J. You, C. Morris, X. Ren, W. Hamilton, and J. Leskovec,
M. Ester et al., “A comprehensive survey on deep clustering: Taxonomy, “Hierarchical graph representation learning with differentiable pooling,”
challenges, and future directions,” arXiv preprint arXiv:2206.07579, Proc. of NeurIPS, 2018.
2022. [38] J. Lee, I. Lee, and J. Kang, “Self-attention graph pooling,” in Proc. of
[10] Y. Ren, J. Pu, Z. Yang, J. Xu, G. Li, X. Pu, P. S. Yu, and L. He, “Deep ICML, 2019.
clustering: A comprehensive survey,” arXiv preprint arXiv:2210.04142, [39] S. Fan, X. Wang, C. Shi, E. Lu, K. Lin, and B. Wang, “One2multi graph
2022. autoencoder for multi-view graph clustering,” in Proc. of WWW, 2020.
[11] E. Aljalbout, V. Golkov, Y. Siddiqui, M. Strobel, and D. Cremers, [40] J. Cheng, Q. Wang, Z. Tao, D. Xie, and Q. Gao, “Multi-view attribute
“Clustering with deep learning: Taxonomy and new methods,” arXiv graph convolution networks for clustering,” in Proceedings of the
preprint arXiv:1801.07648, 2018. Twenty-Ninth International Conference on International Joint Confer-
[12] E. Min, X. Guo, Q. Liu, G. Zhang, J. Cui, and J. Long, “A survey ences on Artificial Intelligence, 2021.
of clustering with deep learning: From the perspective of network [41] W. Xia, Q. Wang, Q. Gao, X. Zhang, and X. Gao, “Self-supervised
architecture,” IEEE Access, 2018. graph convolutional network for multi-view clustering,” IEEE Trans.
[13] S. Wang, J. Yang, J. Yao, Y. Bai, and W. Zhu, “An overview of advanced Multim., 2022.
deep graph node clustering,” IEEE Transactions on Computational [42] N. Mrabah, M. Bouguessa, M. F. Touati, and R. Ksantini, “Rethink-
Social Systems, 2023. ing graph auto-encoder models for attributed graph clustering,” IEEE
[14] S. Xing, X. Shan, L. Fanzhen, W. Jia, Y. Jian, Z. Chuan, H. Wenbin, Transactions on Knowledge and Data Engineering, 2022.
P. Cecile, N. Surya, J. Di et al., “A comprehensive survey on community [43] Z. Yu, Y. Lu, Y. Wang, F. Tang, K.-C. Wong, and X. Li, “Zinb-based
detection with deep learning,” IEEE Trans. Neural Netw. Learn. Syst, graph embedding autoencoder for single-cell rna-seq interpretations,” in
2022. Proc. of AAAI, 2022.
[15] F. Liu, S. Xue, J. Wu, C. Zhou, W. Hu, C. Paris, S. Nepal, J. Yang, and [44] R. D. Hjelm, A. Fedorov, S. Lavoie-Marchildon, K. Grewal, P. Bach-
P. S. Yu, “Deep learning for community detection: progress, challenges man, A. Trischler, and Y. Bengio, “Learning deep representations by
and opportunities,” in Proceedings of the Twenty-Ninth International mutual information estimation and maximization,” in Proc. of ICLR,
Conference on International Joint Conferences on Artificial Intelli- 2018.
gence, 2021. [45] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for
[16] F. Tian, B. Gao, Q. Cui, E. Chen, and T.-Y. Liu, “Learning deep unsupervised visual representation learning,” in Proc. of CVPR, 2020.
representations for graph clustering,” in Proc. of AAAI, 2014. [46] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework
[17] A. Ng et al., “Sparse autoencoder,” CS294A Lecture notes, 2011. for contrastive learning of visual representations,” in Proc. of ICML,
[18] J. A. Hartigan and M. A. Wong, “Algorithm as 136: A k-means 2020.
clustering algorithm,” Journal of the royal statistical society. series c [47] J.-B. Grill, F. Strub, F. Altché, C. Tallec, P. H. Richemond,
(applied statistics), 1979. E. Buchatskaya, C. Doersch, B. A. Pires, Z. D. Guo, M. G. Azar
[19] S. Cao, W. Lu, and Q. Xu, “Deep neural networks for learning graph et al., “Bootstrap your own latent: A new approach to self-supervised
representations,” in Proc. of AAAI, 2016. learning,” arXiv preprint arXiv:2006.07733, 2020.
[20] S. Chang, W. Han, J. Tang, G.-J. Qi, C. C. Aggarwal, and T. S. Huang, [48] J. Zbontar, L. Jing, I. Misra, Y. LeCun, and S. Deny, “Barlow twins:
“Heterogeneous network embedding via deep architectures,” in Proc. of Self-supervised learning via redundancy reduction,” arXiv preprint
KDD, 2015. arXiv:2103.03230, 2021.
[49] X. Yang, X. Hu, S. Zhou, X. Liu, and E. Zhu, “Interpolation-based
[21] C. Wang, S. Pan, G. Long, X. Zhu, and J. Jiang, “Mgae: Marginalized
contrastive learning for few-label semi-supervised learning,” IEEE
graph autoencoder for graph clustering,” in Proceedings of the 2017
Transactions on Neural Networks and Learning Systems, 2022.
ACM on Conference on Information and Knowledge Management,
[50] X. Yang, J. Jin, S. Wang, K. Liang, Y. Liu, Y. Wen, S. Liu, S. Zhou,
2017.
X. Liu, and E. Zhu, “Dealmvc: Dual contrastive calibration for multi-
[22] U. Von Luxburg, “A tutorial on spectral clustering,” Statistics and
view clustering,” arXiv preprint arXiv:2308.09000, 2023.
computing, 2007.
[51] P. Velickovic, W. Fedus, W. L. Hamilton, P. Liò, Y. Bengio, and R. D.
[23] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
Hjelm, “Deep graph infomax.” ICLR (Poster), 2019.
S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,”
[52] Y. Zhu, Y. Xu, F. Yu, Q. Liu, S. Wu, and L. Wang, “Deep Graph
Proc. of NeurIPS, 2014.
Contrastive Representation Learning,” in ICML Workshop on Graph
[24] S. Pan, R. Hu, G. Long, J. Jiang, L. Yao, and C. Zhang, “Adversarially Representation Learning and Beyond, 2020.
regularized graph autoencoder for graph embedding,” in Proc. of IJCAI, [53] J. Xia, L. Wu, J. Chen, B. Hu, and S. Z. Li, “Simgrace: A simple
2018. framework for graph contrastive learning without data augmentation,”
[25] S. Pan, R. Hu, S.-f. Fung, G. Long, J. Jiang, and C. Zhang, “Learning arXiv preprint arXiv:2202.03104, 2022.
graph embedding with adversarial training methods,” IEEE transactions [54] Y. You, T. Chen, Y. Sui, T. Chen, Z. Wang, and Y. Shen, “Graph
on cybernetics, 2019. contrastive learning with augmentations,” Proc. of NeurIPS, 2020.
[26] H. Gao, J. Pei, and H. Huang, “Progan: Network embedding via [55] S. Thakoor, C. Tallec, M. G. Azar, R. Munos, P. Veličković, and
proximity generative adversarial network,” in Proc. of KDD, 2019. M. Valko, “Bootstrapped representation learning on graphs,” in ICLR
[27] Y. Jia, Q. Zhang, W. Zhang, and X. Wang, “Communitygan: Community 2021 Workshop on Geometrical and Topological Representation Learn-
detection with generative adversarial nets,” in Proc. of WWW, 2019. ing, 2021.
[28] C. Wang, S. Pan, R. Hu, G. Long, J. Jiang, and C. Zhang, “At- [56] K. Liang, Y. Liu, S. Zhou, X. Liu, and W. Tu, “Relational symmetry
tributed graph clustering: A deep attentional embedding approach,” based knowledge graph contrastive learning,” 2022.
arXiv preprint arXiv:1906.06532, 2019. [57] Y. Zhu, Y. Xu, Q. Liu, and S. Wu, “An empirical study of graph
[29] J. Xie, R. Girshick, and A. Farhadi, “Unsupervised deep embedding for contrastive learning,” arXiv preprint arXiv:2109.01116, 2021.
clustering analysis,” in Proc. of ICML, 2016. [58] G. Cui, J. Zhou, C. Yang, and Z. Liu, “Adaptive graph encoder for
[30] X. Guo, L. Gao, X. Liu, and J. Yin, “Improved deep embedded attributed graph embedding,” in Proc. of KDD, 2020.
clustering with local structure preservation.” in Proc. of IJCAI, 2017. [59] K. Hassani and A. H. Khasahmadi, “Contrastive multi-view representa-
[31] J. Park, M. Lee, H. J. Chang, K. Lee, and J. Y. Choi, “Symmetric tion learning on graphs,” in Proc. of ICML, 2020.
graph convolutional autoencoder for unsupervised graph representation [60] E. Pan and Z. Kang, “Multi-view contrastive graph clustering,” Proc. of
learning,” in Proc. of ICCV, 2019. NeurIPS, 2021.
[32] X. Zhang, H. Liu, Q. Li, and X.-M. Wu, “Attributed graph clustering [61] X. Wang, N. Liu, H. Han, and C. Shi, “Self-supervised heterogeneous
via adaptive graph convolution,” in Proc. of IJCAI, 2019. graph neural network with co-contrastive learning,” in Proc. of KDD,
[33] D. Bo, X. Wang, C. Shi, M. Zhu, E. Lu, and P. Cui, “Structural deep 2021.
clustering network,” in Proc. of WWW, 2020. [62] H. Zhao, X. Yang, Z. Wang, E. Yang, and C. Deng, “Graph debiased
[34] W. Tu, S. Zhou, X. Liu, X. Guo, Z. Cai, J. Cheng et al., “Deep fusion contrastive learning with joint representation clustering,” in Proc. of
clustering network,” arXiv preprint arXiv:2012.09600, 2020. IJCAI, 2021.
[35] F. M. Bianchi, D. Grattarola, and C. Alippi, “Spectral clustering with [63] N. Lee, J. Lee, and C. Park, “Augmentation-free self-supervised learning
graph neural networks for graph pooling,” in Proc. of ICML, 2020. on graphs,” arXiv preprint arXiv:2112.02472, 2021.
18

[64] Y. Liu, X. Yang, S. Zhou, and X. Liu, “Simple contrastive graph clus- [92] P.-Y. Huang, R. Frederking et al., “Rwr-gae: Random walk regulariza-
tering,” IEEE Transactions on Neural Networks and Learning Systems, tion for graph auto encoders,” arXiv preprint arXiv:1908.04003, 2019.
2023. [93] H. Zhang, P. Li, R. Zhang, and X. Li, “Embedding graph auto-encoder
[65] Y. Liu, Y. Zheng, D. Zhang, H. Chen, H. Peng, and S. Pan, “Towards for graph clustering,” IEEE Transactions on Neural Networks and
unsupervised deep graph structure learning,” in Proceedings of the ACM Learning Systems, 2022.
Web Conference 2022, 2022. [94] W. Li, S. Wang, X. Guo, and E. Zhu, “Deep graph clustering with
[66] Y. Liu, W. Tu, S. Zhou, X. Liu, L. Song, X. Yang, and E. Zhu, “Deep multi-level subspace fusion,” Pattern Recognition, 2023.
graph clustering via dual correlation reduction,” in Proc. of AAAI, 2022. [95] J. Li, J. Yu, J. Li, H. Zhang, K. Zhao, Y. Rong, H. Cheng, and J. Huang,
[67] Y. Liu, S. Zhou, X. Liu, W. Tu, and X. Yang, “Improved dual correlation “Dirichlet graph variational autoencoder,” Proc. of NeurIPS, 2020.
reduction network,” arXiv preprint arXiv:2202.12533, 2022. [96] P. Zhu, J. Li, Y. Wang, B. Xiao, S. Zhao, and Q. Hu, “Collaborative
[68] Y. Liu, X. Yang, S. Zhou, X. Liu, Z. Wang, K. Liang, W. Tu, L. Li, decision-reinforced self-supervision for attributed graph clustering,”
J. Duan, and C. Chen, “Hard sample aware network for contrastive IEEE Transactions on Neural Networks and Learning Systems, 2022.
deep graph clustering,” in Proc. of AAAI, 2023. [97] H. Zhong, J. Wu, C. Chen, J. Huang, M. Deng, L. Nie, Z. Lin, and X.-S.
[69] X. Yang, Y. Liu, S. Zhou, S. Wang, W. Tu, Q. Zheng, X. Liu, L. Fang, Hua, “Graph contrastive clustering,” in Proc. of ICCV, 2021.
and E. Zhu, “Cluster-guided contrastive graph clustering network,” in [98] S. Ding, B. Wu, X. Xu, L. Guo, and L. Ding, “Graph clustering network
Proc. of AAAI, 2023. with structure embedding enhanced,” Pattern Recognition, 2023.
[70] F. Devvrit, A. Sinha, I. Dhillon, and P. Jain, “S3gc: Scalable self- [99] N. Mrabah, M. Bouguessa, and R. Ksantini, “Escaping feature twist: A
supervised graph clustering,” 2022. variational graph auto-encoder for node clustering,” Proceedings of the
[71] Y. Liu, K. Liang, J. Xia, S. Zhou, X. Yang, X. Liu, and S. Z. Li, “Dink- 31st IJCAI.
net: Neural clustering on large graphs,” in Proc. of ICML, 2023. [100] L. Guo and Q. Dai, “Graph clustering via variational graph embedding,”
[72] W. Shiao, U. S. Saini, Y. Liu, T. Zhao, N. Shah, and E. E. Papalex- Pattern Recognition, 2022.
akis, “Carl-g: Clustering-accelerated representation learning on graphs,” [101] C. Wang, S. Pan, P. Y. Celina, R. Hu, G. Long, and C. Zhang, “Deep
ACM SIGKDD Explorations Newsletter, 2023. neighbor-aware embedding for node clustering in attributed graphs,”
[73] L. Sun, F. Wang, J. Ye, H. Peng, and P. S. Yu, “Contrastive graph Pattern Recognition, 2022.
clustering in curvature spaces,” 2023. [102] L. Yang, Y. Wang, J. Gu, C. Wang, X. Cao, and Y. Guo, “Jane: Jointly
[74] E. Pan and Z. Kang, “Beyond homophily: Reconstructing structure for adversarial network embedding.” in Proc. of IJCAI, 2020.
graph-agnostic clustering,” 2023. [103] L. Yu, S. Pei, L. Ding, J. Zhou, L. Li, C. Zhang, and X. Zhang, “Sail:
[75] Y. Wen, S. Wang, Q. Liao, W. Liang, K. Liang, X. Wan, and X. Liu, Self-augmented graph contrastive learning,” in Proc. of AAAI, 2022.
“Unpaired multi-view graph clustering with cross-view structure match- [104] W. Xia, Q. Wang, Q. Gao, M. Yang, and X. Gao, “Self-consistent
ing,” IEEE Transactions on Neural Networks and Learning Systems, contrastive attributed graph clustering with pseudo-label prompt,” IEEE
2023. Transactions on Multimedia, 2022.
[76] J.-O. Palacio-Niño and F. Berzal, “Evaluation metrics for unsupervised [105] T. Zhang, Y. Xiong, J. Zhang, Y. Zhang, Y. Jiao, and Y. Zhu, “Commdgi:
learning algorithms,” arXiv preprint arXiv:1905.05667, 2019. community detection oriented deep graph infomax,” in Proc. of CIKM,
[77] A. Lutov, M. Khayati, and P. Cudré-Mauroux, “Accuracy evaluation 2020.
of overlapping and multi-resolution clustering algorithms on large [106] Z. Tao, H. Liu, J. Li, Z. Wang, and Y. Fu, “Adversarial graph embedding
datasets,” in 2019 IEEE International Conference on Big Data and for ensemble clustering,” in International Joint Conferences on Artificial
Smart Computing (BigComp), 2019. Intelligence Organization, 2019.
[78] A. F. McDaid, D. Greene, and N. Hurley, “Normalized mutual infor- [107] H. Liang and J. Gao, “Wasserstein adversarially regularized graph
mation to evaluate overlapping community finding algorithms,” arXiv autoencoder,” Neurocomputing, 2023.
preprint arXiv:1110.2515, 2011. [108] H. Sun, Y. Li, B. Lv, W. Yan, L. He, S. Qiao, and J. Huang, “Graph
[79] K. Y. Yeung and W. L. Ruzzo, “Details of the adjusted rand index community infomax,” ACM Transactions on Knowledge Discovery from
and clustering algorithms, supplement to the paper an empirical study Data (TKDD), 2021.
on principal component analysis for clustering gene expression data,” [109] T. Wang, G. Yang, Q. He, Z. Zhang, and J. Wu, “Ncagc: A neighborhood
Bioinformatics, 2001. contrast framework for attributed graph clustering,” arXiv preprint
[80] M. D. Plummer and L. Lovász, Matching theory. Elsevier, 1986. arXiv:2206.07897, 2022.
[81] S. Aranganayagi and K. Thangavel, “Clustering categorical data us- [110] H. Xu, W. Xia, Q. Gao, J. Han, and X. Gao, “Graph embedding cluster-
ing silhouette coefficient as a relocating measure,” in International ing: Graph attention auto-encoder with cluster-specificity distribution,”
conference on computational intelligence and multimedia applications Neural Networks, 2021.
(ICCIMA 2007), 2007. [111] Z. Peng, H. Liu, Y. Jia, and J. Hou, “Attention-driven graph clustering
[82] X. Wang and Y. Xu, “An improved index for clustering validation based network,” in Proc. of ACM MM, 2021.
on silhouette index and calinski-harabasz index,” in IOP Conference [112] ——, “Deep attention-guided graph clustering with dual self-
Series: Materials Science and Engineering, 2019. supervision,” IEEE Transactions on Circuits and Systems for Video
[83] S. Petrovic, “A comparison between the silhouette index and the davies- Technology, 2022.
bouldin index in labelling ids clusters,” in Proceedings of the 11th [113] Z. Zhong, G. Gonzalez, D. Grattarola, and J. Pang, “Unsupervised
Nordic workshop of secure IT systems, 2006. network embedding beyond homophily,” Transactions on Machine
[84] D. Jin, B. Li, P. Jiao, D. He, and W. Zhang, “Network-specific varia- Learning Research, 2022.
tional auto-encoder for embedding in attribute networks.” in Proc. of [114] G. K. Kulatilleke, M. Portmann, and S. S. Chandra, “Scgc:
IJCAI, 2019. Self-supervised contrastive graph clustering,” arXiv preprint
[85] C. Yang, M. Liu, Z. Wang, L. Liu, and J. Han, “Graph clustering with arXiv:2204.12656, 2022.
dynamic embedding,” arXiv preprint arXiv:1712.08249, 2017. [115] L. Gong, S. Zhou, X. Liu, and W. Tu, “Attributed graph clustering with
[86] Y. Liu, K. Liang, J. Xia, X. Yang, S. Zhou, M. Liu, X. Liu, and S. Z. Li, dual redundancy reduction,” in Proc. of IJCAI, 2022.
“Reinforcement graph clustering with unknown cluster number,” 2023. [116] D. Luo, J. Ni, S. Wang, Y. Bian, X. Yu, and X. Zhang, “Deep multi-
[87] X. Yang, C. Tan, Y. Liu, K. Liang, S. Wang, S. Zhou, J. Xia, S. Z. Li, graph clustering via attentive cross-graph association,” in Proceedings
X. Liu, and E. Zhu, “Convert: Contrastive graph clustering with reliable of the 13th international conference on web search and data mining,
augmentation,” arXiv preprint arXiv:2308.08963, 2023. 2020.
[88] X. Yang, Y. Liu, S. Zhou, S. Wang, X. Liu, and E. Zhu, “Contrastive [117] R. A. Khan and M. Kleinsteuber, “Cluster-aware heterogeneous infor-
deep graph clustering with learnable augmentation,” arXiv preprint mation network embedding,” in Proc. of WSDM, 2022.
arXiv:2212.03559, 2022. [118] T. Li, W. Wang, P. Jiao, Y. Wang, R. Ding, H. Wu, L. Pan, and D. Jin,
[89] B. Hui, P. Zhu, and Q. Hu, “Collaborative graph convolutional networks: “Exploring temporal community structure via network embedding,”
Unsupervised learning meets semi-supervised learning,” in Proc. of IEEE Transactions on Cybernetics, 2022.
AAAI, 2020. [119] M. Liu, Y. Liu, K. Liang, S. Wang, S. Zhou, and X. Liu, “Deep temporal
[90] R. Hu, S. Pan, G. Long, Q. Lu, L. Zhu, and J. Jiang, “Going deep: graph clustering,” arXiv preprint arXiv:2305.10738, 2023.
Graph convolutional ladder-shape networks,” in Proc. of AAAI, 2020. [120] N. Park, R. Rossi, E. Koh, I. A. Burhanuddin, S. Kim, F. Du, N. Ahmed,
[91] Y. Hu, X. Li, Y. Wang, Y. Wu, Y. Zhao, C. Yan, J. Yin, and Y. Gao, and C. Faloutsos, “Cgc: Contrastive graph clustering forcommunity
“Adaptive hypergraph auto-encoder for relational data clustering,” IEEE detection and tracking,” in Proceedings of the ACM Web Conference
Transactions on Knowledge and Data Engineering, 2021. 2022, 2022.
19

[121] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of [152] W. Xia, Q. Gao, Q. Wang, X. Gao, C. Ding, and D. Tao, “Tensorized
gated recurrent neural networks on sequence modeling,” arXiv preprint bipartite graph learning for multi-view clustering,” IEEE Transactions
arXiv:1412.3555, 2014. on Pattern Analysis and Machine Intelligence, 2022.
[122] M. Liu, K. Liang, Y. Liu, S. Wang, S. Zhou, and X. Liu, “arxiv4tgc: [153] Y. Liu, M. Jin, S. Pan, C. Zhou, Y. Zheng, F. Xia, and S. Y. Philip,
Large-scale datasets for temporal graph clustering,” arXiv preprint “Graph self-supervised learning: A survey,” IEEE Transactions on
arXiv:2306.04962, 2023. Knowledge and Data Engineering, 2022.
[123] S. K. Pal and S. Mitra, “Multilayer perceptron, fuzzy sets, classifiac- [154] A. Rodriguez and A. Laio, “Clustering by fast search and find of density
tion,” 1992. peaks,” science, 2014.
[124] W. Jin, X. Liu, X. Zhao, Y. Ma, N. Shah, and J. Tang, “Automated [155] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath,
self-supervised learning for graphs,” in Proc. of ICLR, 2021. “Deep reinforcement learning: A brief survey,” IEEE Signal Processing
[125] M. Ju, T. Zhao, Q. Wen, W. Yu, N. Shah, Y. Ye, and C. Zhang, Magazine, 2017.
“Multi-task self-supervised graph neural networks enable strongefr task [156] P. Makwana, T. Kodinariya, and P. Makwana, “Review on determining
generalization,” in Proc. of ICLR, 2022. of cluster in k-means clustering review on determining number of clus-
[126] D. A. Reynolds, “Gaussian mixture models.” Encyclopedia of biomet- ter in k-means clustering,” International Journal of Advance Research
rics, 2009. in Computer Science and Management Studies, 2013.
[127] L. Li, S. Wang, X. Liu, E. Zhu, L. Shen, K. Li, and K. Li, “Local [157] Z. Wang, L. Zheng, Y. Li, and S. Wang, “Linkage based face clustering
sample-weighted multiple kernel clustering with consensus discrimina- via graph convolution network,” in Proc. of CVPR, 2019.
tive graph,” IEEE Trans. Neural Networks, 2022. [158] Y. Wang, Y. Zhang, F. Zhang, S. Wang, M. Lin, Y. Zhang, and X. Sun,
[128] J. Zhang, L. Li, S. Wang, J. Liu, Y. Liu, X. Liu, and E. Zhu, “Multiple “Ada-nets: Face clustering via adaptive neighbour discovery in the
kernel clustering with dual noise minimization,” in Proc. of the 30-th structure space,” arXiv preprint arXiv:2202.03800, 2022.
ACM Int. Conf. Multimedia, Lisboa, Portugal, 2022. [159] K. Zhang, T. Li, S. Shen, B. Liu, J. Chen, and Q. Liu, “Adaptive graph
[129] L. Li, J. Zhang, S. Wang, X. Liu, K. Li, and K. Li, “Multi-view bipartite convolutional network with attention graph clustering for co-saliency
graph clustering with coupled noisy feature filter,” IEEE Trans. Knowl. detection,” in Proc. of CVPR, 2020.
Data Eng., 2023. [160] H. Alwassel, D. Mahajan, B. Korbar, L. Torresani, B. Ghanem, and
[130] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” D. Tran, “Self-supervised learning by cross-modal audio-video cluster-
arXiv preprint arXiv:1412.6980, 2014. ing,” Proc. of NeurIPS, 2020.
[131] L. Bottou, “Large-scale machine learning with stochastic gradient [161] B. Chiu, S. K. Sahu, D. Thomas, N. Sengupta, and M. Mahdy, “Autoen-
descent,” in Proceedings of COMPSTAT’2010, 2010. coding keyword correlation graph for document clustering,” in Proc. of
[132] W. Hu, M. Fey, M. Zitnik, Y. Dong, H. Ren, B. Liu, M. Catasta, and ACL, 2020.
J. Leskovec, “Open graph benchmark: Datasets for machine learning on [162] S. Qin, T. Jiang, S. Wu, N. Wang, and X. Zhao, “Graph convolution-
graphs,” Proc. of NeurIPS, 2020. based deep clustering for speech separation,” IEEE Access, 2020.
[133] R. S. Nickerson, “Confirmation bias: A ubiquitous phenomenon in many [163] Y. Zhang, Z. Wang, and J. Shang, “Clusterllm: Large language models
guises,” Review of general psychology, 1998. as a guide for text clustering,” arXiv preprint arXiv:2305.14871, 2023.
[134] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation [164] S. Cavallari, V. W. Zheng, H. Cai, K. C.-C. Chang, and E. Cambria,
learning on large graphs,” Proc. of NeurIPS, 2017. “Learning community embedding with community detection and node
[135] C. R. Shalizi and A. C. Thomas, “Homophily and contagion are generi- embedding on graphs,” in Proceedings of the 2017 ACM on Conference
cally confounded in observational social network studies,” Sociological on Information and Knowledge Management, 2017.
methods & research, 2011. [165] B. Rozemberczki, R. Davies, R. Sarkar, and C. Sutton, “Gemsec: Graph
embedding with self clustering,” in Proceedings of the 2019 IEEE/ACM
[136] X. Zheng, Y. Liu, S. Pan, M. Zhang, D. Jin, and P. S. Yu, “Graph
international conference on advances in social networks analysis and
neural networks for graphs with heterophily: A survey,” arXiv preprint
mining, 2019.
arXiv:2202.07082, 2022.
[166] C. Tu, X. Zeng, H. Wang, Z. Zhang, Z. Liu, M. Sun, B. Zhang, and
[137] W. Tu, S. Zhou, X. Liu, Y. Liu, Z. Cai, E. Zhu, Z. Changwang, and
L. Lin, “A unified framework for community detection and network
J. Cheng, “Initializing then refining: A simple graph attribute imputation
representation learning,” IEEE Transactions on Knowledge and Data
network,” IJCAI, pages, 2022.
Engineering, 2018.
[138] X. Chen, S. Chen, J. Yao, H. Zheng, Y. Zhang, and I. W. Tsang,
[167] F. Liu, S. Xue, J. Wu, C. Zhou, W. Hu, C. Paris, S. Nepal, J. Yang, and
“Learning on attribute-missing graphs,” IEEE transactions on pattern
P. S. Yu, “Deep learning for community detection: progress, challenges
analysis and machine intelligence, 2020.
and opportunities,” arXiv preprint arXiv:2005.08225, 2020.
[139] S. Chen and Y. C. Eldar, “Graph signal denoising via unrolling net- [168] I. Ahmed, T. Galoppo, X. Hu, and Y. Ding, “Graph regularized au-
works,” in Proc. of ICASSP, 2021. toencoder and its application in unsupervised anomaly detection,” IEEE
[140] T. H. Do, D. M. Nguyen, and N. Deligiannis, “Graph auto-encoder for Transactions on Pattern Analysis and Machine Intelligence, 2021.
graph signal denoising,” in Proc. of ICASSP, 2020. [169] X. Ma, J. Wu, S. Xue, J. Yang, C. Zhou, Q. Z. Sheng, H. Xiong, and
[141] J. Xia, H. Lin, Y. Xu, L. Wu, Z. Gao, S. Li, and S. Z. Li, “Towards L. Akoglu, “A comprehensive survey on graph anomaly detection with
robust graph neural networks against label noise,” 2020. deep learning,” IEEE Transactions on Knowledge and Data Engineer-
[142] P. Baldi, “Autoencoders, unsupervised learning, and deep architectures,” ing, 2021.
in Proceedings of ICML workshop on unsupervised and transfer learn- [170] Y. Chen, Z. Liu, J. Li, J. McAuley, and C. Xiong, “Intent contrastive
ing, 2012. learning for sequential recommendation,” in Proceedings of the ACM
[143] S. Zheng, Y. Song, T. Leung, and I. Goodfellow, “Improving the Web Conference 2022, 2022.
robustness of deep neural networks via stability training,” in Proc. of [171] G. Grunig, N. Durmus, Y. Zhang, Y. Lu, S. Pehlivan, Y. Wang, K. Doo,
CVPR, 2016. M. L. Cotrina-Vidal, R. Goldring, K. I. Berger et al., “Molecular
[144] A. Rozsa, M. Gunther, and T. E. Boult, “Towards robust deep neural clustering analysis of blood biomarkers in world trade center exposed
networks with bang,” arXiv preprint arXiv:1612.00138, 2016. community members with persistent lower respiratory symptoms,” In-
[145] M. Li, T. Zhang, Y. Chen, and A. J. Smola, “Efficient mini-batch ternational journal of environmental research and public health, 2022.
training for stochastic optimization,” in Proc. of KDD, 2014. [172] J. Xia, Y. Zhu, Y. Du, Y. Liu, and S. Z. Li, “A systematic survey of
[146] L. Lovász, “Random walks on graphs,” Combinatorics, Paul erdos is molecular pre-trained models,” arXiv preprint arXiv:2210.16484, 2022.
eighty, 1993. [173] H. Xue, V. Mallawaarachchi, Y. Zhang, V. Rajan, and Y. Lin, “Rep-
[147] B. Yang, X. Fu, N. D. Sidiropoulos, and M. Hong, “Towards k-means- bin: Constraint-based graph representation learning for metagenomic
friendly spaces: Simultaneous deep learning and clustering,” in Proc. of binning,” in Proc. of AAAI, 2022.
ICML, 2017. [174] H. Alashwal, M. El Halaby, J. J. Crouse, A. Abdalla, and A. A.
[148] A. Grover and J. Leskovec, “node2vec: Scalable feature learning for Moustafa, “The application of unsupervised clustering methods to
networks,” in Proc. of KDD, 2016. alzheimer’s disease,” Frontiers in computational neuroscience, 2019.
[149] J. Xia, L. Wu, G. Wang, J. Chen, and S. Z. Li, “Progcl: Rethinking hard [175] R. Jiang, S. Han, Y. Yu, and W. Ding, “An access control model for
negative mining in graph contrastive learning,” in Proc. of ICML, 2022. medical big data based on clustering and risk,” Information Sciences,
[150] J. Newling and F. Fleuret, “Nested mini-batch k-means,” Proc. of 2023.
NeurIPS, 2016. [176] R. Dhaundiyal, A. Tripathi, K. Joshi, M. Diwakar, and P. Singh,
[151] S. Wang, X. Liu, L. Liu, W. Tu, X. Zhu, J. Liu, S. Zhou, and “Clustering based multi-modality medical image fusion,” in Journal of
E. Zhu, “Highly-efficient incomplete large-scale multi-view clustering Physics: Conference Series, 2020.
with consensus bipartite graph,” in Proc. of CVPR, 2022.
20

Yue Liu is a computer science master’s stu- Chenchen Fan is a postdoctoral researcher at
dent at National University of Dense Technol- the Chinese PLA General Hospital. Prior to this,
ogy (NUDT). His research interests include self- he obtained his PhD from the Institute of Au-
supervised learning, deep graph clustering, and tomation, Chinese Academy of Sciences and
knowledge graphs. He has published 10+ peer- his bachelor’s degree from the Beijing University
reviewed papers, including ICLR, ICML, AAAI, of Chemical Technology. His research interests
IJCAI, ACM MM, SIGIR, IEEE T-KDE, and IEEE include medical artificial intelligence, medical im-
T-NNLS. More detailed information is listed on age processing, etc.
his homepage: https://yueliu1999.github.io/.

Yan Zhuang is now a senior engineer at Medical


Jun Xia received the B. Eng honour degree Big Data Research Center, Chinese PLA Gen-
from Central South University (CSU), Changsha, eral Hospital. He has rich experience in hospital
China in 2020. He is currently pursuing Ph.D. informatization and research, expertized in med-
degree at Westlake University and Zhejiang Uni- ical big data and artificial intelligence technolo-
versity (ZJU), Hangzhou, China, supervised by gies. He has published 20+ papers as the first
Chair Professor Stan Z.Li (IEEE Fellow). His re- author and 90+ collaborative papers in the field
search interests include graph neural networks of medical big data. In addition, He was honored
and AI for scientific discovery. He has published with the Best Paper Award at CIKM 2017.
20+ papers in top-tier conferences such as IEEE
T-KDE, ICML, ICLR, CVPR, WWW, ACL, IJCAI,
ACM Multimedia, ECML-PKDD, etc.

Stan Z. Li received Ph.D. degree from Surrey


University, United Kingdom. He is currently a
chair professor and the director of Center for
Sihang Zhou received his PhD degree from Artificial Intelligence Research and Innovation
School of Computer, National University of De- (CAIRI), School of Engineering, Westlake Uni-
fense Technology (NUDT), China. He is now versity. His research interests include pattern
lecturer at College of Intelligence Science and recognition, machine learning, image process-
Technology, NUDT. His current research inter- ing, face recognition, biometrics, and intelligent
ests include machine learning and medical im- video surveillance. He has published 400+ pa-
age analysis. Dr. Zhou has published 40+ peer- pers in international journals and conferences
reviewed papers, including IEEE T-IP, IEEE T- and authored and edited eight books. He was an
NNLS, IEEE T-MI, Information Fusion, Medical associate editor of the IEEE T-PAMI. He was elevated to IEEE fellow for
Image Analysis, AAAI, MICCAI, etc. his contributions to face recognition, pattern recognition, and computer
vision, and he is a member of the IEEE Computer Society.

Xinwang Liu received his PhD degree from Na-


tional University of Defense Technology (NUDT),
Xihong Yang is pursuing his Ph.D. at Na-
China. He is now Professor of School of Com-
tional University of Defense Technology (NUDT),
puter, NUDT. His current research interests in-
China. His current research interests include
clude kernel learning and unsupervised fea-
semi-supervised learning, self-supervised learn-
ture learning. Dr. Liu has published 60+ peer-
ing, and graph neural networks. He has pub-
reviewed papers, including those in highly re-
lished several papers in top journals and con-
garded journals and conferences such as IEEE
ferences, such as IEEE T-NNLS, IEEE T-KDE,
T-PAMI, IEEE T-KDE, IEEE T-IP, IEEE T-NNLS,
IEEE T-AI, AAAI, ICML, ACM MM, etc.
IEEE T-MM, IEEE T-IFS, ICML, NeurIPS, ICCV,
CVPR, AAAI, IJCAI, etc. He serves as the asso-
ciated editor of T-NNLS, T-CYB and Information Fusion Journal. More
information can be found at https://xinwangliu.github.io/.

Ke Liang is currently pursuing a Ph.D. degree Kunlun He received his M.D. degree from
at the National University of Defense Technol- The 3rd Military Medical University, Chongqing,
ogy (NUDT). Before joining NUDT, he got BSc China, in 1988, and Ph.D. degree in Cardiol-
degree from Beihang University (BUAA) in 2017 ogy from Chinese PLA Medical School, Beijing,
and MSc degree from Pennsylvania State Uni- China, in 1999. He worked as a postdoctoral
versity (PSU) in 2021. His research interests in- research fellow at the Division of circulatory
clude knowledge graphs, graph learning, health- physiology of Columbia University from 1999 to
care AI, and urban computing. He has published 2003. He is the director and professor of Med-
several top-tier journal and conference papers, ical Big Data Research Center, Chinese PLA
e.g., IEEE T-KDE, IEEE T-NNLS, SIGIR, AAAI, General Hospital, Beijing, China. His research
ICML, ACM MM, etc. interests include medical big data and artificial
intelligence for cardiovascular disease.

You might also like