A Survey On Network Embedding
A Survey On Network Embedding
Abstract—Network embedding assigns nodes in a network to low-dimensional representations and effectively preserves the network
structure. Recently, a significant amount of progresses have been made toward this emerging network analysis paradigm. In this
survey, we focus on categorizing and then reviewing the current development on network embedding methods, and point out its future
research directions. We first summarize the motivation of network embedding. We discuss the classical graph embedding algorithms
and their relationship with network embedding. Afterwards and primarily, we provide a comprehensive overview of a large number of
network embedding methods in a systematic manner, covering the structure- and property-preserving network embedding methods,
the network embedding methods with side information, and the advanced information preserving network embedding methods.
Moreover, several evaluation approaches for network embedding and some useful online resources, including the network data sets
and softwares, are reviewed, too. Finally, we discuss the framework of exploiting these network embedding methods to build an
effective system and point out some potential future directions.
1 INTRODUCTION
Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
834 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 31, NO. 5, MAY 2019
Fig. 2. A comparison between network topology based network analysis and network embedding based network analysis.
Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
CUI ET AL.: A SURVEY ON NETWORK EMBEDDING 835
edge or relationship between two nodes, then the distance of 2.1 The Categorization of Network Embedding
these two nodes in the embedding space should be relatively Methods
small. In this way, the network relationships can be well pre- As mentioned before, network embedding usually has two
served. Second, the learned embedding space can effectively goals, i.e., network reconstruction and network inference.
support network inference, such as predicting unseen links, The traditional graph embedding methods, mainly focusing
identifying important nodes, and inferring node labels. It on network reconstruction, has been widely studied. We will
should be noted that an embedding space with only the goal briefly review those methods in Section 3. Fu and Ma [9]
of network reconstruction is not sufficient for network infer- present a more detailed survey. In this paper, we focus on
ence. Taking the link prediction problem as an example, if the recently proposed network embedding methods aiming
we only consider the goal of network reconstruction, the to address the goal of network inference. The categorization
embedding vectors learned by SVD tend to fit all the structure of the related works is shown in Fig. 3.
observed links and zero values in the adjacency matrix,
which may lead to overfitting and cannot infer unseen links. 2.1.1 Structure and Property Preserving Network
In this paper, we survey the state-of-the-art works on net- Embedding
work embedding and point out future research directions.
Among all the information encoded in a network, network
In Section 2, we first categorize network embedding meth-
structures and properties are two crucial factors that largely
ods according to the types of information preserved in
affect network inference. Consider a network with only
embedding, and summarize the commonly used models.
topology information. Many network analysis tasks, such as
We briefly review the traditional graph embedding meth- identifying important nodes and predicting unseen links,
ods and discuss the difference of these methods with the can be conducted in the original network space. However, as
recent network embedding methods in Section 3. Then, in mentioned before, directly conducting these tasks based on
Sections 4, 5 and 6, we respectively review the methods on network topology has a series of problems, and thus poses a
structure and property preserving network embedding, question that whether we can learn a network embedding
network embedding with side information, as well as space purely based on the network topology information,
advanced information preserving network embedding. In such that these tasks can be well supported in this low
Section 7, we present a few evaluation scenarios and some dimensional space. Motivated by this, attempts are proposed
online resources, including the data sets and codes, for net- to preserve rich structural information into network embed-
work embedding. We conclude and discuss a series of pos- ding, from nodes and links [10] to neighborhood struc-
sible future directions in Section 8. ture [3], high-order proximities of nodes [6], and community
structures [4]. All these types of structural information have
2 CATEGORIZATION AND THE MODELS been demonstrated useful and necessary in various network
To support network inference, more information beyond analysis tasks. Besides this structural information, network
nodes and links needs to be preserved in embedding space. properties in the original network space are not ignorable in
Most research works on network embedding develop along modeling the formation and evolution of networks. To name
this line in recent years. There are multiple ways to catego- a few, network transitivity (i.e. triangle closure) is the driving
rize them. In this paper, according to the types of information force of link formation in networks [11], and structural bal-
that are preserved in network embedding, we categorize the ance property plays an important role in the evolution of
existing methods into three categories, that is, (1) network signed networks [12]. Preserving these properties in a net-
structure and properties preserving network embedding, (2) work embedding space is, however, challenging due to the
network embedding with side information and (3) advanced inhomogeneity between the network space and the embed-
information preserving network embedding. ding vector space. Some recent studies begin to look into this
Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
836 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 31, NO. 5, MAY 2019
problem and demonstrate the possibility of aligning these goals. The commonly used models include matrix factoriza-
two spaces at the property level [8], [13]. tion, random walk, deep neural networks and their variations.
Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
CUI ET AL.: A SURVEY ON NETWORK EMBEDDING 837
X X
because of their huge successes in other fields. The key chal- min kxi Wij xj k2 ; (2)
W
lenges are how to make deep models fit network data, and i j
how to impose network structure and property-level con- where the weight Wij measures the contribution of the
straints on deep models. Some representative methods, entry xj to the reconstruction of entry xi . Finally, in the
such as SDNE [6], SDAE [26], and SiNE [13], propose deep low-dimensional space, LLE constructs a neighborhood-
learning models for network embedding to address these preserving mapping based on locally linear reconstruc-
challenges. At the same time, deep neural networks are also tion as follows:
well known for their advantages in providing end-to-end X X
solutions. Therefore, in the problems where advanced infor- min kui Wij uj k2 : (3)
U
mation is available, it is natural to exploit deep models to i j
come up with an end-to-end network embedding solution. By optimizing the above function, the low-dimensional
For instance, some deep model based end-to-end solutions representation matrix U, which preserves the neighborhood
are proposed for cascade prediction [18] and network structure, can be obtained.
alignment [22]. Laplacian eigenmaps (LE) [30] also begins with con-
The network embedding models are not limited to those structing a graph using -neighborhoods or K nearest
mentioned in this subsection. Moreover, the three kinds of neighbors. Then the heat kernel [31] is utilized to choose the
models are not mutually exclusive, and their combinations weight Wij of nodes i and j in the graph. Finally, the repre-
are possible to make new solutions. More models and sentation ui of node i can be obtained by minimizing the fol-
details will be discussed in later sections. lowing function:
X
3 NETWORK EMBEDDING VERSUS GRAPH kui uj k2 Wij ¼ trðUT LUÞ; (4)
EMBEDDING i;j
The goal of graph embedding is similar as network embed- where L ¼ D W is the Laplacian matrix, and D is the
ding, that is, to embed a graph into a low-dimensional vec- P
diagonal matrix with Dii ¼ j Wji . In addition, the con-
tor space [27]. There is a rich literature in graph embedding. straint UT DU ¼ I is introduced to avoid trivial solutions.
Fu and Ma [9] provide a thorough review on the traditional Furthermore, the locality preserving projection (LPP) [32], a
graph embedding methods. Here we only present some rep- linear approximation of the nonlinear LE, is proposed. Also,
resentative and classical methods on graph embedding, it introduces a transformation matrix A such that the repre-
aiming to demonstrate the critical differences between sentation ui of entry xi is ui ¼ AT xi . LPP computes the
graph embedding and the current network embedding. transformation matrix A first, and finally the representation
ui can be obtained.
3.1 Representative Graph Embedding Methods These methods are extended in the rich literature of
Graph embedding methods are originally studied as dimen- graph embedding by considering different characteristics of
sion reduction techniques. A graph is usually constructed the constructed graphs [9].
from a feature represented data set, like image data set. Iso-
map [28] first constructs a neighborhood graph G using con- 3.2 Major Differences
nectivity algorithms such as K nearest neighbors (KNN), Network embedding and graph embedding have substantial
i.e., connecting data entries i and j if i is one of the K nearest differences in objective and assumptions. As mentioned
neighbors of j. Then based on G, the shortest path dG ij of before, network embedding has two goals, i.e. reconstructing
entries i and j in G can be computed. Consequently, for all original networks and support network inference. The objec-
the N data entries in the data set, we have the matrix of tive functions of graph embedding methods mainly target the
graph distances DG ¼ fdG ij g. Finally, the classical multidi- goal of graph reconstruction. As discussed before, the embed-
mensional scaling (MDS) method is applied to DG to obtain ding space learned for network reconstruction is not necessar-
the coordinate vector ui for entry i, which aims to minimize ily good for network inference. Therefore, graph embedding
the following function: can be regarded as a special case of network embedding, and
the recent research progress on network embedding pays
N X
X N
2 more attention to network inference.
ðdG
ij kui uj kÞ : (1)
i¼1 j¼1
Moreover, graph embedding mostly works on graphs con-
structed from feature represented data sets, where the prox-
Indeed, Isomap learns the representation ui of entry i, imity among nodes encoded by the edge weights are well
which approximately preserves the geodesic distances of defined in the original feature space. In contrast, network
the entry pairs in the low-dimensional space. embedding mostly works on naturally formed networks,
The key problem of Isomap is its high complexity due to such as social networks, biology networks, and e-commerce
the computing of pair-wise shortest pathes. Locally linear networks. In those networks, the proximities among nodes
embedding (LLE) [29] is proposed to eliminate the need to are not explicitly or directly defined. The definition of node
estimate the pairwise distances between widely separated proximities depends on specific analytic tasks and application
entries. LLE assumes that each entry and its neighbors lie scenarios. Therefore, we have to incorporate rich information,
on or close to a locally linear patch of a mainfold. To charac- such as network structures, properties, side information and
terize the local geometry, each entry can be reconstructed advanced information, in network embedding to facilitate dif-
from its neighbors as follows: ferent problems and applications.
Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
838 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 31, NO. 5, MAY 2019
In the rest of the paper, we mainly focus on the network representation learning model, is adopted by DeepWalk to
embedding methods with the goal of supporting network learn the representations of nodes. Specifically, as shown in
inference. Fig. 4, DeepWalk adopts a truncated random walk on a net-
work to generate a set of walk sequences. For each walk
4 STRUCTURE AND PROPERTY PRESERVING sequence s ¼ fv1 ; v2 ; :::; vs g, following Skip-Gram, Deep-
Walk aims to maximize the probability of the neighbors of
NETWORK EMBEDDING node vi in this walk sequence as follows:
In essence, one basic requirement of network embedding is
to appropriately preserve network structures and capture max Prðfviw ; :::; viþw gnvi jfðvi ÞÞ ¼ Piþw
j¼iw;j6¼i Prðvj jfðvi ÞÞ;
f
properties of networks. Often, network structures include (5)
first-order structure and higher-order structure, such as where w is the window size, fðvi Þ is the current representa-
second-order structure and community structure. Networks tion of vi and fviw ; :::; viþw gnvi is the local context nodes of
with different types have different properties. For example, vi . Finally, hierarchical soft-max [33] is used to efficiently
directed networks have the asymmetric transitivity prop- infer the embeddings.
erty. The structural balance theory is widely applicable to Node2vec [25] demonstrates that DeepWalk is not expres-
signed networks. sive enough to capture the diversity of connectivity patterns
In this section, we review the representative methods of in a network. Node2vec defines a flexible notion of a node’s
structure preserving network embedding and property pre- network neighborhood and designs a second order random
serving network embedding. walk strategy to sample the neighborhood nodes, which can
smoothly interpolate between breadth-first sampling (BFS)
4.1 Structure Preserving Network Embedding and depth-first sampling (DFS). Node2vec is able to learn the
Network structures can be categorized into different groups representations that embed nodes with same network com-
that present at different granularities. The commonly munity closely, and to learn representations where nodes
exploited network structures in network embedding include sharing similar roles have similar embeddings.
neighborhood structure, high-order node proximity and net- LINE [10] is proposed for large scale network embed-
work communities. ding, and can preserve the first and second order proxim-
DeepWalk [3] is proposed for learning the representations ities. The first order proximity is the observed pairwise
of nodes in a network, which is able to preserve the neighbor proximity between two nodes, such as the observed edge
structures of nodes. DeepWalk discovers that the distribu- between nodes 6 and 7 in Fig. 5. The second order proximity
tion of nodes appearing in short random walks is similar to is determined by the similarity of the “contexts” (neighbors)
the distribution of words in natural language. Motivated by of two nodes. For example, the second order similarity
this observation, Skip-Gram model [24], a widely used word between nodes 5 and 6 can be obtained by their neighbor-
hoods 1, 2, 3, and 4 in Fig. 5. Both the first order and second
order proximities are important in measuring the relation-
ships between two nodes. The first order proximity can be
measured by the joint probability distribution between two
nodes vi and vj as
1
p1 ðvi ; vj Þ ¼ : (6)
1 þ expðuTi uj Þ
expð iÞ
uTj u
Fig. 5. An example of the first-order and second-order structures in a p2 ðvj jvi Þ ¼ P : (7)
k i ÞÞ
expðu Tu
network. Image extracted from [10]. k
Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
CUI ET AL.: A SURVEY ON NETWORK EMBEDDING 839
Fig. 7. Overview of the method proposed by Cao et al. [26]. Image extracted from [26].
Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
840 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 31, NO. 5, MAY 2019
Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
CUI ET AL.: A SURVEY ON NETWORK EMBEDDING 841
Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
842 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 31, NO. 5, MAY 2019
Fig. 11. Overview of the method proposed by Chang et al. [16]. Image extracted from [16].
Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
CUI ET AL.: A SURVEY ON NETWORK EMBEDDING 843
network with two types of data (e.g., images and texts), continuous space. Specifically, the diffusion kernel in a
there are three types of edges, i.e., image-image, text-text, d-dimensional Euclidean space is defined as
and image-text. The nonlinear embeddings of images and
kjik2
texts are learned by a CNN model and the fully connected Kðt; j; iÞ ¼ ð4PtÞ2 e
d
4t : (16)
layers, respectively. By cascading the extra linear embed-
ding layer, the representations of images and texts can be It models the heat at location i at time t when an initial unit
mapped to a common space. In the common space, the heat is positioned at location j, which also models how
similarities between data from different modalities can be information spreads between nodes in a network.
directly measured, so that if there is an edge in the original The goal of the proposed algorithm is to learn the repre-
heterogeneous network, the pair of data has similar sentations of nodes in the latent space such that the diffu-
representations. sion kernel can best explain the cascades in the training set.
Huang and Mamoulis [59] propose a meta path similarity Given the representation uj of the initial contaminated node
preserving heterogeneous information network embedding j in cascade c, the contamination score of node i can be com-
algorithm. To model a particular relationship, a meta path puted by
[60] is a sequence of object types with edge types in between.
They develop a fast dynamic programming approach to cal- d kuj ui k2
culate the truncated meta path based proximities, whose time Kðt; j; iÞ ¼ ð4PtÞ2 e 4t : (17)
complexity is linear to the size of the network. They adopt a The intuition of Eq. (17) is that the closer a node in the latent
similar strategy as LINE [10] to preserve the proximity in the space is from the source node, the sooner it is infected by
low dimensional space. information from the source node. As the cascade c offers a
Xu et al. [61] propose a network embedding method for guidance for the information diffusion of nodes, we expect
coupled heterogeneous network. The coupled heteroge- the contamination score to be as closely consistent with c as
neous network consists of two different but related homoge- possible, which gives rise to the following empirical risk
neous networks. For each homogeneous network, they function:
adopt the same function (Eq. (6)) as LINE to model the rela-
tionships between nodes. Then the harmonious embedding X
LðUÞ ¼ DðKð:; j; :Þ; cÞ; (18)
matrix is introduced to measure the closeness between c
nodes of different networks. Because the inter-network
edges are able to provide the complementary information in where function D is a measure of the difference between the
the presence of intra-network edges, the learned embed- predicted score and the observed diffusion in c. By minimiz-
dings of nodes also perform well on several tasks. ing the Eq. (18) and reformulating it as a ranking problem,
the optimal representations U of nodes can be obtained.
5.3 Summary The cascade prediction problem here is defined as pre-
In the methods preserving side information, side informa- dicting the increment of cascade size after a given time
tion introduces additional proximity measures so that the interval [18]. Li et al. [18] argue that the previous work on
relationships between nodes can be learned more compre- cascade prediction all depends on the bag of hand-crafting
hensively. Their difference is the way of integrating net- features to represent the cascade and network structures.
work structures and side information. Many of them are Instead, they present an end-to-end deep learning model to
naturally extensions from structure preserving network solve this problem using the idea of network embedding, as
embedding methods. illustrated in Fig. 12. Similar to DeepWalk [3], they perform
a random walk over a cascade graph to sample a set of
paths. Then the Gated Recurrent Unite (GRU) [64], a specific
6 ADVANCED INFORMATION PRESERVING
type of recurrent neural network [65], is applied to these
NETWORK EMBEDDING paths and learn the embeddings for these paths. The atten-
In this section, we review network embedding methods that tion mechanism is then used to assemble these embeddings
take additional advanced information into account so as to to learn the representation of this cascade graph. Once the
solve some specific analytic tasks. Different from side infor- representation of this cascade is known, a multi-layer per-
mation, the advanced information refers to the supervised ceptron [66] can be adopted to output the final predicted
or pseudo supervised information in a specific task. size of this cascade. The whole procedure is able to learn the
representation of cascade graph in an end-to-end manner.
6.1 Information Diffusion The experimental results on the Twitter and Aminer net-
Information diffusion [62] is an ubiquitous phenomenon on works show promising performance on this task.
the web, especially in social networks. Many real applica-
tions, such as marketing, public opinion formation, epidem- 6.2 Anomaly Detection
ics, are related to information diffusion. Most of the Anomaly detection has been widely investigated in previ-
previous studies on information diffusion are conducted in ous work [67]. Anomaly detection in networks aims to infer
original network spaces. the structural inconsistencies, which means the anomalous
Recently, Simon et al. [63] propose a social network embed- nodes that connect to various diverse influential communi-
ding algorithm for predicting information diffusion. The basic ties [21], [68], such as the red node in Fig. 13. Hu et al. [21]
idea is to map the observed information diffusion process into propose a network embedding based method for anomaly
a heat diffusion process modeled by a diffusion kernel in the detection. In particular, in the proposed model, the kth
Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
844 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 31, NO. 5, MAY 2019
Fig. 12. The end-to-end pipeline of DeepCas proposed by Li et al. [18]. Image extracted from [18].
element uki in the embedding ui of node i represents the cor- First, Man et al. [22] extend the original sparse networks
relation between node i and community k. Then, they Gs and Gt to the denser networks. The basic idea is that
assume that the community memberships of two linked given a pair of users with anchor links, if they have a con-
nodes should be similar. Therefore, they can minimize the nection in one network, so do their counterparts in the other
following objective function: network [69], in this way, more links will be added to the
X X original networks. For a pair of nodes i and j whose repre-
L¼ kui uj k2 þ a ðkui uj k 1Þ2 : (19) sentations are ui and uj , respectively, by combining the neg-
ði;jÞ2E ði;jÞ 2
=E ative sampling strategy, they use the following function to
preserve the structures of Gs and Gt in a vector space:
This optimization problem can be solved by the gradient
descent method. By taking the neighbors of a node into X
K
account, the embedding of the node can be obtained by a log sðuTi uj Þ þ Evk /Pn ðvÞ ½log ð1 sðuTi uk ÞÞ; (20)
weighted sum of the embeddings of all its neighbors. An k¼1
anomaly node in this context is one connecting to a set of
different communities. Since the learned embedding of where sðxÞ ¼ 1=ð1 þ expðxÞÞ. The first term models the
nodes captures the correlations between nodes and commu- observed edges, and the second term samples K negative
nities, based on the embedding, they propose a new mea- edges.
sure to indicate the anomalousness level of a node. The Then given the observed anchor links ðvsi ; utj Þ 2 T and the
larger the value of the measure, the higher the propensity representations ui and uj , they aim to learn a mapping
for a node being an anomaly node.
Fig. 13. The anomalous (red) nodes in embedding, and A, B, C, and D Fig. 14. The illustrative diagram of network embedding for anchor link
are four communities [21]. Image extracted from [21]. prediction proposed by Man et al. [22]. Image extracted from [22].
Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
CUI ET AL.: A SURVEY ON NETWORK EMBEDDING 845
TABLE 1
A Summary of Different Types of Network Embedding Methods
Categorization Technique
Method
Network Side Advanced Matrix Random Deep neural others
alone information information factorization walk network
p p
DeepWalk [3] p p
LINE [10] p p
GraRep [34] p p
SDNE [6] p p
Node2vec [25] p p
M-NMF [4] p p
Cao et al. [26] p p
Chen et al. [40] p p
HOPE [8] p p
SiNE [13] p p
Ou et al. [44] p p
MMDW [14] p p
Le et al. [50] p p
TADW [50] p p
Sun et al. [52] p p
Pan et al. [53] p p
LANE [54] p p
Yann et al. [58] p p
Chang et al. [16] p p
Huang et al. [59] p p
Xu et al. [61] p p
Simon et al. [63] p p
Li et al. [18] p p
Hu et al. [21] p p
Man et al. [22]
function f parameterized by u so as to bridge these two rep- structure so as to learn the representations of nodes. The
resentations. The loss function is defined as: other is to establish the connection between the representa-
tions of nodes and the target task. The first one is similar to
kfðui ; uÞ uj kF : (21) structure and property preserving network embedding,
while the second one usually needs to consider the domain
The mapping function can be linear or non-linear via Multi- knowledge of a specific task. The domain knowledge
Layer Perceptron (MLP) [66]. By optimizing Eq. (20) and encoded by the advanced information makes it possible to
Eq. (21) simultaneously, the representations that can pre- develop end-to-end solutions for network applications.
serve the network structure and respect the observed Compared with the hand-crafted network features, such as
anchor links can be learned. numerous network centrality measures, the combination of
advanced information and network embedding techniques
6.4 Summary enables representation learning for networks. Many network
Advanced information preserving network embedding usu- applications may be benefitted from this new paradigm.
ally consists of two parts. One is to preserve the network All the aforementioned network embedding methods are
summarized in Table 1. Because some works lack of the
TABLE 2 technique details, here we mainly summarize the time com-
The Complexity of Some Representative Network plexity of some representative network embedding meth-
Embedding Methods ods, shown in Table 2. In particular, K is the representation
dimension, and N and jEj are the number of nodes and
Method Time complexity edges, respectively. I is the number of iterations. L is the
DeepWalk [3] OðKNlog NÞ walk length and w is the number of random-walk trials. e is
LINE [10] OðKjEjÞ the number of epochs, and S is the number of sampled
GraRep [34] OðNjEj þ KN 2 Þ
training triplets and J is a constant related with the struc-
SDNE [6] OðKIjEjÞ
Node2vec [25] OðKNÞ ture of deep neural network. m is the length of node attrib-
M-NMF [4] OðKIN 2 Þ utes. nzðXÞ is the number of non-zeros in matrix X. As we
Chen et al. [40] OðKmaxðjEj; NLwÞÞ can see, the time complexity of some network embedding
HOPE [8] OðK 2 IjEjÞ methods, such as DeepWalk, LINE, and Node2vec, is linear
SiNE [13] OðeNSJÞ with respect to the number of nodes N. Therefore, these
TADW [50] OðNjEj þ KIjEj þ KmIN þ K 2 INÞ methods are usually scalable for large networks. While,
TriDNR [53] OðK nzðXÞlog ðmÞ þ KNlog ðNÞÞ
some works, such as GraRep, M-NMF, and LANE, have the
LANE [54] OðmN 2 þ KIN 2 Þ
Yann et al. [58] OðjEjÞ quadratic complexity, which may cause the scalability issue
and limit them to be applied to the large networks.
Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
846 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 31, NO. 5, MAY 2019
TABLE 3
A Summary of Real World Networks
TABLE 4
A Summary of the Source Code
Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
CUI ET AL.: A SURVEY ON NETWORK EMBEDDING 847
7.1.2 Citation Networks has been assessed in [3], [6], [10], [54]. Some studies [3], [6],
[10] apply their algorithms to the Youtube network, which
DBLP [73]. This network represents the citation rela- also achieves promising classification results. A citation net-
tionships between authors and papers. One instance work usually represents the citation relationships between
of the data set can be found at http://arnetminer. authors or between papers. For example, [10], [53] use the
org/citation. DBLP network to test the classification performance. Cora is
Cora [74]. This network represents the citation rela- used in [14], [15]. Citeseer is used in [14], [15], [53]. The classi-
tionships between scientific publications. Besides the fication performance on language networks, such as Wikipe-
link information, each publication is also associated dia, is also widely studied [10], [14], [15], [25]. The Protein-
with a word vector indicating the absence/presence Protein Interactions (PPI) is used in [25]. Based on NUS-
of the corresponding words from the dictionary. One WIDE [80], a heterogeneous network extracted from Flickr,
instance of the data set can be found at https:// Chang et al. [16] validated the superior classification perfor-
linqs.soe.ucsc.edu/node/236. mance of network embedding on heterogeneous networks.
Citeseer [74]. This network, similar to Cora, also consists To summarize, network embedding algorithms have
of scientific publications and their citation relation- been widely used on various networks and have been well
ships. One instance of the data set can be downloaded demonstrated their effectiveness on node classification.
at https://linqs.soe.ucsc.edu/node/236.
ArXiv [75], [76]. This is the collaboration network 7.3 Link Prediction
constructed from the ArXiv website. One instance of
Link prediction, as one of the most fundamental problems
the data set can be found at http://snap.stanford.
on network analysis, has received a considerable amount of
edu/data/ca-AstroPh.html.
attention [7], [82]. It aims to estimate the likelihood of the
7.1.3 Language Networks existence of an edge between two nodes based on observed
network structure [83]. Since network embedding algo-
Wikipedia [77]. This is a word co-occurrence network rithms are able to learn the vector based features for each
from the English Wikipedia pages. One instance of the node, the similarity between nodes can be easily estimated,
data set can be found at http://www.mattmahoney. for example, by the inner product or the cosine similarity.
net/dc/textdata. A larger similarity implies that the two nodes may have a
higher propensity to be linked. Generally, precision @k and
7.1.4 Biological Networks Mean Average Precision (MAP) are used to evaluate the
link prediction performance [6].
PPI [78]. This is a subgraph of the biological network The popularly used real networks for the link prediction
that represents the pairwise physical interactions task can be divided into three categories: citation networks
between proteins in yeast. One instance of the data set (ARXIV [75], [76] and DBLP1), social networks (SN-TWeibo2,
can be downloaded at http://konect.uni-koblenz.de/ SN-Twitter [72], Facebook [76], Epinions3, and Slashdot4),
networks/maayan-vidal. and biological networks (PPI [78]). Specifically, [6] and [25]
test the effectiveness on ARXIV5. HOPE [8] applies network
7.2 Node Classification embedding to link prediction on two directed networks SN-
Given some nodes with known labels in a network, the node Twitter, which is a subnetwork of Twitter6, and SN-TWeibo,
classification problem is to classify the rest nodes into differ- which is a subnetwork of the social network in Tencent
ent classes. Node classification is one of most primary appli- Weibo7. Node2vec [25] tests the performance of link predic-
cations for network embedding [3], [10]. Essentially, node tion on a social network Facebook and a biological network
classification based on network embedding for can be PPI. EOE [61] uses DBLP to demonstrate the effectiveness on
divided into three steps. First, a network embedding algo- citation networks. Based on two social networks, Epinions
rithm is applied to embed the network into a low dimen- and Slashdot, SiNE [13] shows the superior performance of
sional space. Then, the nodes with known labels are used as signed network embedding on link prediction.
the training set. Last, a classifier, such as Liblinear [79], is To sum up, network embedding is able to capture inher-
learned from the training set. Using the trained classifier, ent network structures, and thus naturally it is suitable for
we can infer the labels of the rest nodes. The popularly used link prediction applications. Extensive experiments on vari-
evaluation metrics for multi-label classification problem ous networks have demonstrated that network embedding
include Micro-F1 and Macro-F1 [70]. can tackle link prediction effectively.
The multi-label classification application has been suc-
cessfully tested on four categories of data sets, namely social 7.4 Node Clustering
networks (BLOGCATALOG [70], FLICKR [70], and YOU-
Node clustering is to divide the nodes in a network into
TUBE [71]), citation networks (DBLP [73], Cora [74], and
clusters such that the nodes within the same cluster are
Citeseer [74]), language networks (Wikipedia [77]), and bio-
logical networks (PPI [78]).
1. http://dblp.uni-trier.de/
Specifically, a social network usually is a communication 2. http://www.kddcup2012.org/c/kddcup2012-track1/data
network among users on online platforms. DeepWalk [3], 3. http://www.epinions.com/
GraRep [34], SDNE [6], node2vec [25], and LANE [54] con- 4. http://slashdot.org/
5. https://arxiv.org/
duct classification on BLOGCATALOG to evaluate the per- 6. https://twitter.com/
formance. Also, the classification performance on FLICKR 7. http://t.qq.com/
Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
848 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 31, NO. 5, MAY 2019
Fig. 15. Network visualization of 20-NewsGroup by different network embedding algorithms, i.e., SDNE [6], LINE [10], DeepWalk [3], GraRep [34],
and LE [81]. Image extracted from SDNE [6].
more similar to each other than the nodes in different clus- 7.5 Network Visualization
ters. Network embedding algorithms learn representations Another important application of network embedding is
of nodes in low dimensional vector spaces, so many network visualization, that is, generating meaningful visu-
typical clustering methods, such as Kmeans [84], can be alization that layouts a network on a two dimensional
directly adopted to cluster nodes based on their learned space. By applying the visualization tool, such as t-
representations. SNE [90], to the learned low dimensional representations of
Many evaluation criteria have been proposed for cluster- nodes, it is easy for users to see a big picture of a sophisti-
ing evaluation. Accuracy (AC) and normalized mutual cated network so that the community structure or node cen-
information (NMI) [85] are frequently used to assess the trality can be easily revealed.
clustering performance on graphs and networks. More often than not, the quality of network visualization
The node clustering performance is tested on three types by different network embedding algorithms is evaluated
of networks: social networks (e.g., Facebook [86] and visually. Fig. 15 is an example by SDNE [6] where SDNE is
YELP [59]), citation networks (e.g., DBLP [60]), and docu- applied to 20-NewsGroup. In Fig. 15, each document is
ment networks (e.g., 20-NewsGroup [87]). In particular, [16] mapped into a two dimensional space as a point, and differ-
extracts a social network from a social blogging site. It uses ent colors on the points represent the labels. As can be seen,
the TF-IDF features extracted from the blogs as the features network embedding preserves the intrinsic structure of
of blog users and the “following” behaviors to construct the the network, where similar nodes are closer to each other
linkages. It successfully applies network embedding to the than dissimilar nodes in the low-dimensional space. Also,
node clustering task. [4] uses the Facebook social network LINE [10], GraRep [34], and EOE [61] are applied to a cita-
to demonstrate the effectiveness of community preserving tion network DBLP and generate meaningful layout of the
network embedding on node clustering. [59] is applied network. Pan et al. [53] show the visualization of another
to more social networks including MOVIE, a network citation network Citeseer-M10 [91] consisting of scientific
extracted from YAGO [88] that contains knowledge about publications from ten distinct research areas.
movies, YELP, a network extracted from YELP that is about
reviews given to restaurants, and GAME, extracted from 7.6 Open Source Software
Freebase [89] that is related to video games. [26] tests In Table 4, we provide a collection of links where one can find
the node clustering performance on a document network, the source code of various network embedding methods.
20-NewsGroup network, which consists of documents.
The node clustering performance on citation networks is
8 CONCLUSIONS AND FUTURE RESEARCH
tested [59] by clustering authors in DBLP. The results show
the superior clustering performance on citation networks. DIRECTIONS
In summary, node clustering based on network embed- The above survey of the state-of-the-art network embedding
ding is tested on different types of networks. Network algorithms clearly shows that it is still a young and promis-
embedding has become an effective method to solve the ing research field. To apply network embedding to tackle
node clustering problem. practical applications, a frontmost question is to select the
appropriate methods. In Fig. 16 we show the relationship
among different types of network embedding methods dis-
cussed in this survey.
The structure and property preserving network embed-
ding is the foundation. If one cannot preserve well the net-
work structure and retain the important network
properties, in the embedding space serious information is
loss, which hurts the analytic tasks in sequel. Based on the
structure and property preserving network embedding, one
may apply the off-the-shelf machine learning methods. If
some side information is available, it can be incorporated
into network embedding. Furthermore, the domain knowl-
edge of some certain applications as advanced information
can be considered.
Fig. 16. Relationship among different types of network embedding In the rest of this section, we discuss several interesting
methods. directions for future work.
Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
CUI ET AL.: A SURVEY ON NETWORK EMBEDDING 849
8.1 More Structures and Properties rumors in social network [93], [94]? Can we use network
Although various methods are proposed to preserve struc- embedding to infer social ties [95]? Each real world applica-
tures and properties, such as first order and high order prox- tion has its own characteristics, and incorporating their
imities, communities, asymmetric transitivity, and structural unique domain knowledge into network embedding is a key.
balance, due to the complexity of real world networks, there The technical challenges here are how to model the specific
are still some particular structures that are not fully consid- domain knowledge as advanced information that can be inte-
ered in the existing network embedding methods. For exam- grated into network embedding in an effective manner.
ple, how to incorporate network motifs [92], one of the most
common higher-order structures in a network, into network 8.4 Dynamic Network Embedding
embedding remains an open problem. Also, more complex Although many network embedding methods are proposed,
local structures of a node can be considered to provide they are mainly designed for static networks. However, in
higher level constraints. The current assumption of network real world applications, it is well recognized that many net-
embedding is usually based on the pairwise structure, that works are evolving over time. For example, in the Facebook
is, if two nodes have a link, then their representations are network, friendships between users always dynamically
similar. This assumption can work well for some applica- change over time, e.g., new edges are continuously added to
tions, such as link prediction, but it cannot encode the the social network while some edges may be deleted. To
centrality information of nodes, because the centrality of a learn the representations of nodes in a dynamic network, the
node is usually related to a more complex structure. As existing network embedding methods have to be run repeat-
another example, in several real world applications, an edge edly for each time stamp, which is very time consuming and
may involve more than two nodes, known as a hyperedge. may not meet the realtime processing demand. Most of the
Such a hypernetwork naturally indicates richer relationships existing network embedding methods cannot be directly
among nodes and has its own characteristics. Hypernetwork applied to large scale evolving networks. New network
embedding is important for some real applications. embedding algorithms, which are able to tackle the dynamic
The power law distribution property indicates that most nature of evolving networks, are highly desirable.
nodes in a network are associated with a small number of
edges. Consequently, it is hard to learn an effective repre-
sentation for a node with limited information. How this 8.5 More Embedding Spaces
property affects the performance of network embedding The existing network embedding methods embed a net-
and how to improve the embeddings of the minority nodes work into the Euclidean space. In general, the principle of
are still largely untouched. network embedding can be extended to other target spaces.
For example, recently some studies [96] assume that the
8.2 The Effect of Side Information underlying structure of a network is in the hyperbolic space.
Section 5 discusses a series of network embedding algo- Under this assumption, heterogeneous degree distributions
rithms that preserve side information in embedding. All the and strong clustering emerge naturally, as they are the sim-
existing methods assume that there is an agreement between ple reflections of the negative curvature and metric property
network structure and side information. To what extent the of the underlying hyperbolic geometry. Exploring other
assumption holds in real applications, however, remains an embedding space is another interesting research direction.
open question. The low correlation of side information and
structures may degrade the performance of network embed- ACKNOWLEDGMENTS
ding. Moreover, it is interesting to explore the complemen-
This work was supported in part by the National Program on
tarity between network structures and side information.
Key Basic Research Project (No. 2015CB352300), National
More often than not, each information may contain some
Natural Science Foundation of China (No. U1611461,
knowledge that other information does not have.
No. 61772304, No. 61521002, No. 61531006, No. 61702296), the
Besides, in a heterogeneous information network, to
research fund of the Tsinghua-Tencent Joint Laboratory for
measure the relevance of two objects, the meta path, a
Internet Innovation Technology, and the Young Elite Scientist
sequence of object types with edge types in between, has
Sponsorship Program by CAST. The work of Jian Pei has been
been widely used. However, meta structure [88], which is
supported in part by the NSERC Discovery Grant program,
essentially a directed acyclic graph of object and edge types,
the Canada Research Chair program, and the NSERC Strate-
provides a higher-order structure constraint. This suggests
gic Grant program. All opinions, findings, conclusions and
a huge potential direction for improving heterogeneous
recommendations in this paper are those of the authors and
information network embedding.
do not necessarily reflect the views of the funding agencies.
8.3 More Advanced Information and Tasks
In general, most of network embedding algorithms are
REFERENCES
designed for general purposes, such as link prediction and [1] C. L. Staudt, A. Sazonovs, and H. Meyerhenke, “Networkit: A tool
suite for large-scale network analysis,” Netw. Sci., Cambridge Uni-
node classification. These network embedding methods versity Press, vol. 4, pp. 508–530, 2016.
mainly focus on general network structures and may not be [2] P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and
specific to some target applications. Another important T. Eliassi-Rad, “Collective classification in network data,” AI
research direction is to explore the possibility of designing Mag., vol. 29, no. 3, 2008, Art. no. 93.
[3] B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Online learning
network embedding for more specific applications. For exam- of social representations,” in Proc. 20th ACM SIGKDD Int. Conf.
ple, whether network embedding is a new way to detect Knowl. Discovery Data Mining, 2014, pp. 701–710.
Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
850 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 31, NO. 5, MAY 2019
[4] X. Wang, P. Cui, J. Wang, J. Pei, W. Zhu, and S. Yang, [28] J. B. Tenenbaum, V. De Silva, and J. C. Langford, “A global geo-
“Community preserving network embedding,” in Proc. 31th AAAI metric framework for nonlinear dimensionality reduction,” Sci.,
Conf. Artif. Intell., AAAI Press, 2017, pp. 203–209. vol. 290, no. 5500, pp. 2319–2323, 2000.
[5] I. Herman, G. Melançon, and M. S. Marshall, “Graph visualization [29] S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction
and navigation in information visualization: A survey,” IEEE by locally linear embedding,” Sci., vol. 290, no. 5500, pp. 2323–
Trans. Vis. Comput. Graph., vol. 6, no. 1, pp. 24–43, Jan.-Mar. 2000. 2326, 2000.
[6] D. Wang, P. Cui, and W. Zhu, “Structural deep network [30] M. Belkin and P. Niyogi, “Laplacian eigenmaps and spectral tech-
embedding,” in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discov- niques for embedding and clustering,” in Proc. Adv. Neural Inf.
ery Data Mining, 2016, pp. 1225–1234. Process. Syst., 2002, pp. 585–591.
[7] D. Liben-Nowell and J. Kleinberg, “The link-prediction problem [31] N. Berline, E. Getzler, and M. Vergne, Heat Kernels and Dirac Oper-
for social networks,” J. Assoc. Inf. Sci. Technol., vol. 58, no. 7, ators. New York, NY, USA: Springer Science & Business Media,
pp. 1019–1031, 2007. 2003.
[8] M. Ou, P. Cui, J. Pei, Z. Zhang, and W. Zhu, “Asymmetric transi- [32] X. He and P. Niyogi, “Locality preserving projections,” in Proc.
tivity preserving graph embedding,” in Proc. 22nd ACM SIGKDD Adv. Neural Inf. Process. Syst., 2004, pp. 153–160.
Int. Conf. Knowl. Discovery Data Mining, 2016, pp. 672–681. [33] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estima-
[9] Y. Fu and Y. Ma, Graph Embedding for Pattern Analysis. New York, tion of word representations in vector space,” in Proc. ICLR Work-
NY, USA: Springer Science & Business Media, 2012. shops Track, 2013.
[10] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “Line: [34] S. Cao, W. Lu, and Q. Xu, “Grarep: Learning graph representa-
Large-scale information network embedding,” in Proc. 24th Int. tions with global structural information,” in Proc. 24th ACM Int.
Conf. World Wide Web, 2015, pp. 1067–1077. Conf. Inf. Knowl. Manage, 2015, pp. 891–900.
[11] H. Huang, J. Tang, S. Wu, L. Liu, et al., “Mining triadic closure [35] M. Girvan and M. E. Newman, “Community structure in social
patterns in social networks,” in Proc. 23rd Int. Conf. World Wide and biological networks,” Proc. Nat. Academy Sci. USA, vol. 99,
Web, 2014, pp. 499–504. no. 12, pp. 7821–7826, 2002.
[12] D. Cartwright and F. Harary, “Structural balance: A generaliza- [36] D. D. Lee and H. S. Seung, “Algorithms for non-negative matrix
tion of heider’s theory,” Psychological Rev., vol. 63, no. 5, 1956, factorization,” in Proc. Adv. Neural Inf. Process. Syst., 2001, pp. 556–
Art. no. 277. 562.
[13] S. Wang, J. Tang, C. Aggarwal, Y. Chang, and H. Liu, “Signed net- [37] M. E. Newman, “Finding community structure in networks using
work embedding in social media,” in Proc. SIAM Int. Conf. Data the eigenvectors of matrices,” Phys. Rev. E, vol. 74, no. 3, 2006,
Mining, 2017, pp. 327–335. Art. no. 036104.
[14] C. Tu, W. Zhang, Z. Liu, and M. Sun, “Max-margin deepwalk: [38] O. Levy and Y. Goldberg, “Neural word embedding as implicit
Discriminative learning of network representation,” in Proc. 25th matrix factorization,” in Proc. Adv. Neural Inf. Process. Syst., 2014,
Int. Joint Conf. Artif. Intell., 2016, pp. 3889–3895. pp. 2177–2185.
[15] C. Yang, Z. Liu, D. Zhao, M. Sun, and E. Y. Chang, “Network [39] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol,
representation learning with rich text information,” in Proc. 24th “Stacked denoising autoencoders: Learning useful representations
Int. Joint Conf. Artif. Intell., 2015, pp. 2111–2117. in a deep network with a local denoising criterion,” J. Mach. Learn.
[16] S. Chang, W. Han, J. Tang, G.-J. Qi, C. C. Aggarwal, and Res., vol. 11, no. Dec, pp. 3371–3408, 2010.
T. S. Huang, “Heterogeneous network embedding via deep [40] S. Chen, S. Niu, L. Akoglu, J. Kovacevic, and C. Faloutsos, “Fast,
architectures,” in Proc. 21th ACM SIGKDD Int. Conf. Knowl Discov- warped graph embedding: Unifying framework and one-click
ery Data Mining, 2015, pp. 119–128. algorithm,” arXiv preprint arXiv:1702.05764, 2017.
[17] N. Natarajan and I. S. Dhillon, “Inductive matrix completion for [41] P. Yanardag and S. Vishwanathan, “Deep graph kernels,” in Proc.
predicting gene–disease associations,” Bioinf., vol. 30, no. 12, 21th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2015,
pp. i60–i68, 2014. pp. 1365–1374.
[18] C. Li, J. Ma, X. Guo, and Q. Mei, “Deepcas: An end-to-end predic- [42] M. Niepert, M. Ahmed, and K. Kutzkov, “Learning convolutional
tor of information cascades,” in Proc. 26th Int. Conf. World Wide neural networks for graphs,” in Proc. Int. Conf. Mach. Learn., 2016,
Web, 2017, pp. 577–586. pp. 2014–2023.
[19] S. Yeung, O. Russakovsky, G. Mori, and L. Fei-Fei, “End-to-end [43] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature,
learning of action detection from frame glimpses in videos,” in vol. 521, no. 7553, 2015, Art. no. 436.
Proc. IEEE Conf. Comput. Vision Pattern Recognit., 2016, pp. 2678– [44] M. Ou, P. Cui, F. Wang, J. Wang, and W. Zhu, “Non-transitive
2687. hashing with latent similarity components,” in Proc. 21th ACM
[20] X. Yang, Y.-N. Chen, D. Hakkani-T€ ur, P. Crook, X. Li, J. Gao, and SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2015, pp. 895–
L. Deng, “End-to-end joint learning of natural language under- 904.
standing and dialogue manager,” in Proc. IEEE Int. Conf. Acoust. [45] L. Katz, “A new status index derived from sociometric analysis,”
Speech Signal Process., 2017, pp. 5690–5694. Psychometrika, vol. 18, no. 1, pp. 39–43, 1953.
[21] R. Hu, C. C. Aggarwal, S. Ma, and J. Huai, “An embedding [46] L. A. Adamic and E. Adar, “Friends and neighbors on the web,”
approach to anomaly detection,” in Proc. IEEE 32nd Int. Conf. Data Soc. Netw., vol. 25, no. 3, pp. 211–230, 2003.
Eng., 2016, pp. 385–396. [47] C. C. Paige and M. A. Saunders, “Towards a generalized singular
[22] T. Man, H. Shen, S. Liu, X. Jin, and X. Cheng, “Predict anchor links value decomposition,” SIAM J. Numerical Anal., vol. 18, no. 3,
across social networks via an embedding approach,” in Proc. 25th pp. 398–405, 1981.
Int. Joint Conf. Artif. Intell., 2016, pp. 1823–1829. [48] M. Cygan, M. Pilipczuk, M. Pilipczuk, and J. O. Wojtaszczyk,
[23] T. Chen and Y. Sun, “Task-guided and path-augmented het- “Sitting closer to friends than enemies, revisited,” Theory Comput.
erogeneous network embedding for author identification,” in Syst., vol. 56, no. 2, pp. 394–405, 2015.
Proc. 10th ACM Int. Conf. Web Search Data Mining, 2017, [49] M. A. Hearst, S. T. Dumais, E. Osuna, J. Platt, and B. Scholkopf,
pp. 295–304. “Support vector machines,” IEEE Intell. Syst. Appl., vol. 13, no. 4,
[24] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, pp. 18–28, July-Aug. 1988.
“Distributed representations of words and phrases and their [50] T. M. Le and H. W. Lauw, “Probabilistic latent document network
compositionality,” in Proc. Adv. Neural Inf. Process. Syst., 2013, embedding,” in Proc. IEEE Int. Conf. Data Mining, 2014, pp. 270–
pp. 3111–3119. 279.
[25] A. Grover and J. Leskovec, “node2vec: Scalable feature learning [51] J. Chang and D. M. Blei, “Relational topic models for document
for networks,” in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Dis- networks,” in Proc. Int. Conf. Artif. Intell. Statist., 2009, pp. 81–88.
covery Data Mining, 2016, pp. 1225–1234. [52] X. Sun, J. Guo, X. Ding, and T. Liu, “A general framework
[26] S. Cao, W. Lu, and Q. Xu, “Deep neural networks for learning for content-enhanced network representation learning,”
graph representations,” in Proc. 30th AAAI Conf. Artif. Intell., 2016, arXiv:1610.02906, 2016.
pp. 1145–1152. [53] S. Pan, J. Wu, X. Zhu, C. Zhang, and Y. Wang, “Tri-party deep net-
[27] S. Yan, D. Xu, B. Zhang, and H.-J. Zhang, “Graph embedding: A work representation,” Netw., vol. 11, no. 9, 2016, Art. no. 12.
general framework for dimensionality reduction,” in Proc. IEEE [54] X. Huang, J. Li, and X. Hu, “Label informed attributed network
Comput. Soc. Conf. Comput. Vision Pattern Recognit., 2005, vol. 2, embedding,” in Proc. 10th ACM Int. Conf. Web Search Data Mining,
pp. 830–837. 2017, pp. 731–739.
Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
CUI ET AL.: A SURVEY ON NETWORK EMBEDDING 851
[55] F. R. K. Chung and F. C. Graham, “Spectral graph theory,” Ameri- [79] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin,
can Mathematical Soc., no. 92, 1997. “Liblinear: A library for large linear classification,” J. Mach. Learn.
[56] X. Huang, J. Li, and X. Hu, “Accelerated attributed network Res., vol. 9, no. Aug, pp. 1871–1874, 2008.
embedding,” in Proc. SIAM Int. Conf. Data Mining, 2017, pp. 633– [80] T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng, “Nus-
641. wide: A real-world web image database from national university
[57] X. Wei, L. Xu, B. Cao, and P. S. Yu, “Cross view link prediction by of singapore,” in Proc. ACM Int. Conf. Image Video Retrieval, 2009,
learning noise-resilient representation consensus,” in Proc. 26th Art. no. 48.
Int. Conf. World Wide Web, 2017, pp. 1611–1619. [81] M. Belkin and P. Niyogi, “Laplacian eigenmaps for dimensional-
[58] Y. Jacob, L. Denoyer, and P. Gallinari, “Learning latent representa- ity reduction and data representation,” Neural Comput., vol. 15,
tions of nodes for classifying in heterogeneous social networks,” no. 6, pp. 1373–1396, 2003.
in Proc. 7th ACM Int. Conf. Web Search Data Mining, 2014, pp. 373– [82] L. L€ u and T. Zhou, “Link prediction in complex networks: A
382. survey,” Physica A: Statist. Mech. Appl., vol. 390, no. 6, pp. 1150–
[59] Z. Huang and N. Mamoulis, “Heterogeneous information net- 1170, 2011.
work embedding for meta path based proximity,” arXiv: [83] L. Getoor and C. P. Diehl, “Link mining: a survey,” ACM SIGKDD
1701.05291, 2017. Explorations Newsletter, vol. 7, no. 2, pp. 3–12, 2005.
[60] Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu, “Pathsim: Meta path- [84] J. MacQueen, et al., “Some methods for classification and analysis
based top-k similarity search in heterogeneous information of multivariate observations,” in Proc.e 5th Berkeley Symp. Math.
networks,” Proc. VLDB Endowment, vol. 4, no. 11, pp. 992–1003, Statist. Probability, 1967, pp. 281–297.
2011. [85] D. Cai, X. He, J. Han, and T. S. Huang, “Graph regularized non-
[61] L. Xu, X. Wei, J. Cao, and P. S. Yu, “Embedding of embedding negative matrix factorization for data representation,” IEEE Trans.
(eoe): Joint embedding for coupled heterogeneous networks,” in Pattern Anal. Mach. Intell., vol. 33, no. 8, pp. 1548–1560, Aug. 2011.
Proc. 10th ACM Int. Conf. Web Search Data Mining, 2017, pp. 741– [86] A. L. Traud, P. J. Mucha, and M. A. Porter, “Social structure of
749. facebook networks,” Physica A: Statist. Mech. Appl., vol. 391,
[62] A. Guille, H. Hacid, C. Favre, and D. A. Zighed, “Information dif- no. 16, pp. 4165–4180, 2012.
fusion in online social networks: A survey,” ACM SIGMOD [87] F. Tian, B. Gao, Q. Cui, E. Chen, and T.-Y. Liu, “Learning deep
Record, vol. 42, no. 2, pp. 17–28, 2013. representations for graph clustering” in Proc. AAAI, 2014,
[63] S. Bourigault, C. Lagnier, S. Lamprier, L. Denoyer, and pp. 1293–1299.
P. Gallinari, “Learning social network embeddings for predicting [88] Z. Huang, Y. Zheng, R. Cheng, Y. Sun, N. Mamoulis, and X. Li,
information diffusion,” in Proc. 7th ACM Int. Conf. Web search Data “Meta structure: Computing relevance in large heterogeneous
Mining, 2014, pp. 393–402. information networks,” in Proc. 22nd ACM SIGKDD Int. Conf.
[64] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Knowl. Discovery Data Mining, 2016, pp. 1595–1604.
Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997. [89] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor,
[65] T. Mikolov, M. Karafi at, L. Burget, J. Cernock
y, and S. Khudanpur, “Freebase: A collaboratively created graph database for structur-
“Recurrent neural network based language model,” Interspeech, ing human knowledge,” in Proc. ACM SIGMOD Int. Conf. Manage.
vol. 2, 2010, Art. no. 3. Data, 2008, pp. 1247–1250.
[66] D. W. Ruck, S. K. Rogers, M. Kabrisky, M. E. Oxley, and [90] M. Laurens van der and G. Hinton, “Visualizing data using t-sne,”
B. W. Suter, “The multilayer perceptron as an approximation to a J. Mach. Learn. Res., vol. 9, pp. 2579–2605, 2008.
bayes optimal discriminant function,” IEEE Trans. Neural Netw., [91] K. W. Lim and W. Buntine, “Bibliographic analysis with the cita-
vol. 1, no. 4, pp. 296–298, Dec. 1990. tion network topic model,” arXiv:1609.06826, 2016.
[67] L. Akoglu, H. Tong, and D. Koutra, “Graph based anomaly detec- [92] A. R. Benson, D. F. Gleich, and J. Leskovec, “Higher-order organi-
tion and description: A survey,” Data Mining Knowl. Discovery, zation of complex networks,” Sci., vol. 353, no. 6295, pp. 163–166,
vol. 29, no. 3, pp. 626–688, 2015. 2016.
[68] R. S. Burt, “Structural holes and good ideas,” Amer. J. Sociology, [93] E. Seo, P. Mohapatra, and T. Abdelzaher, “Identifying rumors and
vol. 110, no. 2, pp. 349–399, 2004. their sources in social networks,” SPIE Defense Secur. Sens.,
[69] M. Bayati, M. Gerritsen, D. F. Gleich, A. Saberi, and Y. Wang, pp. 83 891I–83 891I, 2012.
“Algorithms for large, sparse network alignment problems,” in [94] Q. Zhang, S. Zhang, J. Dong, J. Xiong, and X. Cheng, “Automatic
Proc. 9th IEEE Int. Conf. Data Mining, 2009, pp. 705–710. detection of rumor on social network,” in Proc. Natural Lang. Pro-
[70] L. Tang and H. Liu, “Relational learning via latent social cess. Chinese Comput., 2015, pp. 113–122.
dimensions,” in Proc. 15th ACM SIGKDD Int. Conf. Knowl. Discov- [95] J. Tang, T. Lou, and J. Kleinberg, “Inferring social ties across heter-
ery Data Mining, 2009, pp. 817–826. ogenous networks,” in Proc. 5th ACM Int. Conf. Web Search Data
[71] L. tang and H. Liu, “Scalable learning of collective behavior based Mining, 2012, pp. 743–752.
on sparse social dimensions,” in Proc. 18th ACM Conf. Inf. Knowl. [96] D. Krioukov, F. Papadopoulos, M. Kitsak, A. Vahdat, and
Manage., 2009, pp. 1107–1116. M. Bogun a, “Hyperbolic geometry of complex networks,” Phys.
[72] M. De Choudhury, Y.-R. Lin, H. Sundaram, K. S. Candan, L. Xie, Rev. E, vol. 82, no. 3, 2010, Art. no. 036106.
A. Kelliher et al., “How does the data sampling strategy impact
the discovery of information diffusion in social media?” ICWSM, Peng Cui received the PhD degree from
vol. 10, pp. 34–41, 2010. Tsinghua University, in 2010. He is an associate
[73] J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su, “Arnetminer: professor with tenure at Tsinghua University. His
extraction and mining of academic social networks,” in Proc. 14th research interests include network representation
ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2008, learning, human behavioral modeling, and social-
pp. 990–998. sensed multimedia computing. He has published
[74] A. K. McCallum, K. Nigam, J. Rennie, and K. Seymore, more than 60 papers in prestigious conferences
“Automating the construction of internet portals with machine and journals in data mining and multimedia. His
learning,” Inf. Retrieval, vol. 3, no. 2, pp. 127–163, 2000. recent research received the SIGKDD 2016 Best
[75] J. Leskovec, J. Kleinberg, and C. Faloutsos, “Graph evolution: Paper Finalist, ICDM 2015 Best Student Paper
Densification and shrinking diameters,” ACM Trans. Knowl. Dis- Award, SIGKDD 2014 Best Paper Finalist, IEEE
covery Data, vol. 1, no. 1, 2007, Art. no. 2. ICME 2014 Best Paper Award, ACM MM12 Grand Challenge Multimodal
[76] J. Leskovec and A. Krevl, “Snap datasets: Stanford large network Award, and MMM13 Best Paper Award. He is an associate editor of
dataset collection (2014),” 2016. [Online]. Available: http://snap. IEEE Transactions on Knowledge and Data Engineering, the IEEE
stanford.edu/data Transactions on Big Data, the ACM Transactions on Multimedia Com-
[77] M. Mahoney, “Large text compression benchmark,” 2011, http:// puting, Communications, and Applications, the Elsevier Journal on Neu-
www. mattmahoney.net/text/text.html rocomputing, etc. He was the recipient of ACM China Rising Star Award
[78] B.-J. Breitkreutz, C. Stark, T. Reguly, L. Boucher, A. Breitkreutz, in 2015.
M. Livstone, R. Oughtred, D. H. Lackner, J. B€ahler, V. Wood,
et al., “The biogrid interaction database: 2008 update,” Nucleic
Acids Res., vol. 36, no. suppl_1, pp. D637–D640, 2007.
Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
852 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 31, NO. 5, MAY 2019
Xiao Wang received the PhD degree from the Wenwu Zhu received the PhD degree from New
School of Computer Science and Technology, York University, New York, NY, in 1996. He is
Tianjin University, Tianjin, China, in 2016. He is a currently a professor and the vice chair of the
postdoctoral researcher with the Department of Department of Computer Science, Tsinghua
Computer Science and Technology, Tsinghua University, Beijing, China. Prior to his current
University, Beijing, China. He received the China position, he was a senior researcher and a
Scholarship Council Fellowship, in 2014 and vis- research manager with Microsoft Research Asia,
ited Washington University in St. Louis, as a joint Beijing, China. His current research interests
training student from Nov. 2014 to Nov. 2015. His include the areas of multimedia computing,
current research interests include data mining, communications, and networking, as well as big
social network analysis, and machine learning. data. He has been serving as the editor-in-chief
Until now, he has published more than 20 papers in conferences such for the IEEE Transactions on Multimedia (T-MM) since January 1, 2017.
as AAAI, IJCAI, etc., and journals such as the IEEE Transactions on He was the recipient of five Best Paper Awards including T-CSVT in
Cybernetics, etc. Now his research is sponsored by the National Science 2001 and ACM Multimedia 2012. He is a fellow of the IEEE, AAAS,
Foundation of China. SPIE, and an ACM distinguished scientist.
Jian Pei is currently a vice president of JD.com " For more information on this or any other computing topic,
and a professor with Simon Fraser University. please visit our Digital Library at www.computer.org/publications/dlib.
He is in charge of the big data and smart supply
chain business at JD.com. Recognized as an
ACM fellow and an IEEE fellow, he is responsible
for a series of inventions in data mining and data-
base systems. He has more than 200 technical
publications, which have been cited extensively.
His research has generated remarkable impact
substantially beyond academia. He received
several prestigious awards, including the ACM
SIGKDD Innovation Award, ACM SIGKDD Service Award, IEEE ICDM
Research Contribution Award, and IEEE ICDE Most Influential Paper
Award. He is a fellow of the IEEE.
Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.