[go: up one dir, main page]

0% found this document useful (0 votes)
14 views20 pages

A Survey On Network Embedding

This document is a survey on network embedding, which focuses on assigning low-dimensional representations to nodes in a network while preserving its structure. It categorizes various network embedding methods, reviews classical graph embedding algorithms, and discusses future research directions. The paper also highlights the importance of network embedding in facilitating advanced network analysis tasks such as node classification and link prediction.

Uploaded by

avalondancing
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views20 pages

A Survey On Network Embedding

This document is a survey on network embedding, which focuses on assigning low-dimensional representations to nodes in a network while preserving its structure. It categorizes various network embedding methods, reviews classical graph embedding algorithms, and discusses future research directions. The paper also highlights the importance of network embedding in facilitating advanced network analysis tasks such as node classification and link prediction.

Uploaded by

avalondancing
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 31, NO.

5, MAY 2019 833

A Survey on Network Embedding


Peng Cui , Xiao Wang, Jian Pei , Fellow, IEEE, and Wenwu Zhu , Fellow, IEEE

Abstract—Network embedding assigns nodes in a network to low-dimensional representations and effectively preserves the network
structure. Recently, a significant amount of progresses have been made toward this emerging network analysis paradigm. In this
survey, we focus on categorizing and then reviewing the current development on network embedding methods, and point out its future
research directions. We first summarize the motivation of network embedding. We discuss the classical graph embedding algorithms
and their relationship with network embedding. Afterwards and primarily, we provide a comprehensive overview of a large number of
network embedding methods in a systematic manner, covering the structure- and property-preserving network embedding methods,
the network embedding methods with side information, and the advanced information preserving network embedding methods.
Moreover, several evaluation approaches for network embedding and some useful online resources, including the network data sets
and softwares, are reviewed, too. Finally, we discuss the framework of exploiting these network embedding methods to build an
effective system and point out some potential future directions.

Index Terms—Network embedding, graph embedding, network analysis, data science

1 INTRODUCTION

M ANY complex systems take the form of networks, such


as social networks, biological networks, and informa-
tion networks. It is well recognized that network data is often
compute such a distance using the traditional net-
work representation, we have to enumerate many
possible paths between two nodes, which is in nature
sophisticated and thus is challenging to deal with. To process a combinatorial problem. As another example, many
network data effectively, the first critical challenge is to find studies assume that a node with links to important
effective network data representation, that is, how to repre- nodes tends to be important, and vice versa. In order
sent networks concisely so that advanced analytic tasks, to evaluate the importance of a node using the tradi-
such as pattern discovery, analysis and prediction, can be tional network representation, we have to iteratively
conducted efficiently in both time and space. conduct a stochastic node traversal process until
Traditionally, we usually represent a network as a graph reaching a convergence. Such methods using the tra-
G ¼ hV; Ei, where V is a vertex set representing the nodes ditional network representation result in high compu-
in a network, and E is an edge set representing the relation- tational complexity that prevents them from being
ships among the nodes. For large networks, such as those applicable to large-scale real-world networks.
with billions of nodes, the traditional network representa-  Low parallelizability. Parallel and distributed comput-
tion poses several challenges to network processing and ing is de facto to process and analyze large-scale data.
analysis. Network data represented in the traditional way,
however, casts severe difficulties to design and imple-
 High computational complexity. The nodes in a network mentation of parallel and distributed algorithms.
are related to each other to a certain degree, encoded The bottleneck is that nodes in a network are coupled
by the edge set E in the traditional network represen- to each other explicitly reflected by E. Thus, distribut-
tation. These relationships cause most of the network ing different nodes in different shards or servers often
processing or analysis algorithms either iterative or causes demandingly high communication cost among
combinatorial computation steps, which result in servers, and holds back speed-up ratio. Although
high computational complexity. For example, a popu- some limited progress is made on graph paralleliza-
lar way is to use the shortest or average path length tion by subtly segmenting large-scale graphs [1],
between two nodes to represent their distance. To the luck of these methods heavily depends on the
topological characteristics of the underlying graphs.
 P. Cui, X. Wang, and W. Zhu are with the Department of Computer Science  Inapplicability of machine learning methods. Recently,
and Technology, Tsinghua University, Beijing 100084, China. machine learning methods, especially deep learning,
E-mail: {cuip, wwzhu}@tsinghua.edu.cn, wangxiao007@mail.tsinghua.edu.cn.
 J. Pei is with the School of Computing Science, Simon Fraser University,
are very powerful in many areas. These methods
Burnaby, BC V5A 1S6, Canada. E-mail: jpei@cs.sfu.ca. provide standard, general and effective solutions to
Manuscript received 23 Nov. 2017, revised 11 May 2018, accepted 31 May a broad range of problems. For network data repre-
2018, Date of publication 22 June 2018; date of current version 29 Mar. 2019. sented in the traditional way, however, most of the
(Corresponding authors: Peng Cui and Xiao Wang.) off-the-shelf machine learning methods may not
Recommended for acceptance by J. Xu. applicable. Those methods usually assume that data
For information on obtaining reprints of this article, please send e-mail to:
reprints@ieee.org, and reference the Digital Object Identifier below. samples can be represented by independent vectors
Digital Object Identifier no. 10.1109/TKDE.2018.2849727 in a vector space, while the samples in network data
1041-4347 ß 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See ht_tp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
834 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 31, NO. 5, MAY 2019

To tackle the challenge, substantial effort has been com-


mitted to develop novel network embedding, i.e., learning
low-dimensional vector representations for network nodes.
In the network embedding space, the relationships among
the nodes, which were originally represented by edges or
other high-order topological measures in graphs, is cap-
tured by the distances between nodes in the vector space,
and the topological and structural characteristics of a node
are encoded into its embedding vector. An example is
shown in Fig. 1. After embedding the karate club network
into a two-dimensional space, the similar nodes marked by
the same color are close to each other in the embedding
space, demonstrating that the network structure can be well
modeled in the two-dimensional embedding space.
Network embedding, as a promising way of network
representation, is capable of supporting subsequent net-
work processing and analysis tasks such as node classifica-
tion [2], [3], node clustering [4], network visualization [5],
[6] and link prediction [7], [8]. If this goal is fulfilled, the
advantages of network embedding over traditional network
representation methods are apparent, as shown in Fig. 2.
The traditional topology based network representation usu-
ally directly uses the observed adjacency matrix, which may
contain noise or redundant information. The embedding
based representation first aims to learn the dense and con-
tinuous representations of nodes in a low dimensional
space, so that the noise or redundant information can be
reduced and the intrinsic structure information can be pre-
served. As each node is represented by a vector containing
its information of interest, many iterative or combinatorial
Fig. 1. An example of network embedding on a karate network. Images problems in network analysis can be tackled by computing
are extracted from DeepWalk [3]. mapping functions, distance metrics or operations on the
embedding vectors, and thus avoid high complexity. As the
(i.e., the nodes) are dependant to each other to some nodes are not coupling any more, it is convenient to apply
degree determined by E. Although we can simply main-stream parallel computing solutions for large-scale
represent a node by its corresponding row vector in network analysis. Furthermore, network embedding can
the adjacency matrix of the network, the extremely open the opportunities for network analysis to be benefited
high dimensionality of such a representation in a from the rich literature of machine learning. Many off-the-
large graph with many nodes makes the in sequel shelf machine learning methods such as deep learning mod-
network processing and analysis difficult. els can be directly applied to solve network problems.
The traditional network representation has become a bot- In order to make the embedding space well support net-
tleneck in large-scale network processing and analysis nowa- work analysis tasks, there are two goals for network embed-
days. Representing the relationships explicitly using a set of ding. First, the original network can be reconstructed from
edges in the traditional representation is the upmost barrier. the learned embedding space. It requires that, if there is an

Fig. 2. A comparison between network topology based network analysis and network embedding based network analysis.

Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
CUI ET AL.: A SURVEY ON NETWORK EMBEDDING 835

Fig. 3. An overview of different settings of network embedding.

edge or relationship between two nodes, then the distance of 2.1 The Categorization of Network Embedding
these two nodes in the embedding space should be relatively Methods
small. In this way, the network relationships can be well pre- As mentioned before, network embedding usually has two
served. Second, the learned embedding space can effectively goals, i.e., network reconstruction and network inference.
support network inference, such as predicting unseen links, The traditional graph embedding methods, mainly focusing
identifying important nodes, and inferring node labels. It on network reconstruction, has been widely studied. We will
should be noted that an embedding space with only the goal briefly review those methods in Section 3. Fu and Ma [9]
of network reconstruction is not sufficient for network infer- present a more detailed survey. In this paper, we focus on
ence. Taking the link prediction problem as an example, if the recently proposed network embedding methods aiming
we only consider the goal of network reconstruction, the to address the goal of network inference. The categorization
embedding vectors learned by SVD tend to fit all the structure of the related works is shown in Fig. 3.
observed links and zero values in the adjacency matrix,
which may lead to overfitting and cannot infer unseen links. 2.1.1 Structure and Property Preserving Network
In this paper, we survey the state-of-the-art works on net- Embedding
work embedding and point out future research directions.
Among all the information encoded in a network, network
In Section 2, we first categorize network embedding meth-
structures and properties are two crucial factors that largely
ods according to the types of information preserved in
affect network inference. Consider a network with only
embedding, and summarize the commonly used models.
topology information. Many network analysis tasks, such as
We briefly review the traditional graph embedding meth- identifying important nodes and predicting unseen links,
ods and discuss the difference of these methods with the can be conducted in the original network space. However, as
recent network embedding methods in Section 3. Then, in mentioned before, directly conducting these tasks based on
Sections 4, 5 and 6, we respectively review the methods on network topology has a series of problems, and thus poses a
structure and property preserving network embedding, question that whether we can learn a network embedding
network embedding with side information, as well as space purely based on the network topology information,
advanced information preserving network embedding. In such that these tasks can be well supported in this low
Section 7, we present a few evaluation scenarios and some dimensional space. Motivated by this, attempts are proposed
online resources, including the data sets and codes, for net- to preserve rich structural information into network embed-
work embedding. We conclude and discuss a series of pos- ding, from nodes and links [10] to neighborhood struc-
sible future directions in Section 8. ture [3], high-order proximities of nodes [6], and community
structures [4]. All these types of structural information have
2 CATEGORIZATION AND THE MODELS been demonstrated useful and necessary in various network
To support network inference, more information beyond analysis tasks. Besides this structural information, network
nodes and links needs to be preserved in embedding space. properties in the original network space are not ignorable in
Most research works on network embedding develop along modeling the formation and evolution of networks. To name
this line in recent years. There are multiple ways to catego- a few, network transitivity (i.e. triangle closure) is the driving
rize them. In this paper, according to the types of information force of link formation in networks [11], and structural bal-
that are preserved in network embedding, we categorize the ance property plays an important role in the evolution of
existing methods into three categories, that is, (1) network signed networks [12]. Preserving these properties in a net-
structure and properties preserving network embedding, (2) work embedding space is, however, challenging due to the
network embedding with side information and (3) advanced inhomogeneity between the network space and the embed-
information preserving network embedding. ding vector space. Some recent studies begin to look into this

Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
836 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 31, NO. 5, MAY 2019

problem and demonstrate the possibility of aligning these goals. The commonly used models include matrix factoriza-
two spaces at the property level [8], [13]. tion, random walk, deep neural networks and their variations.

2.1.2 Network Embedding with Side Information 2.2.1 Matrix Factorization


Besides network topology, some types of networks are An adjacency matrix is commonly used to represent the
accompanied with rich side information, such as node con- topology of a network, where each column and each row rep-
tent or labels in information networks [14], node and edge resent a node, and the matrix entries indicate the relation-
attributes in social networks [15], as well as node types in ships among nodes. We can simply use a row vector or
heterogeneous networks [16]. Side information provides column vector as the vector representation of a node, but the
useful clues for characterizing relationships among network formed representation space is N-dimensional, where N is
nodes, and thus is helpful in learning embedding vector the total number of nodes. Network embedding, aiming to
spaces. In the cases where the network topology is relatively learn a low-dimensional vector space for a network, is even-
sparse, the importance of the side information as comple- tually to find a low-rank space to represent a network, in con-
mentary information sources is even more substantial. trast with the N-dimensional space. In this sense, matrix
Methodologically, the main challenge is how to integrate factorization methods, with the same goal of learning low-
and balance the topological and side information in network rank space for the original matrix, can naturally be applied to
embedding. Some multimodal and multisource fusion tech- solve this problem. In the series of matrix factorization mod-
niques are explored in this line of research [15], [17]. els, Singular Value Decomposition (SVD) is commonly used
in network embedding due to its optimality for low-rank
2.1.3 Advanced Information Preserving Network approximation [8]. Non-negative matrix factorization is often
Embedding used because of its advantages as an additive model [4].
In the previous two categories, most methods learn network
embedding in an unsupervised manner. That is, we only take 2.2.2 Random Walk
the network structure, properties, and side information into As mentioned before, preserving network structure is a fun-
account, and try to learn an embedding space to preserve the damental requirement for network embedding. Neighbor-
information. In this way, the learned embedding space is gen- hood structure, describing the local structural characteristics
eral and, hopefully, able to support various network applica- of a node, is important for network embedding. Although the
tions. If we regard network embedding as a way of network adjacency vector of a node encodes the first-order neighbor-
representation learning, the formation of the representation hood structure of a node, it is usually a sparse, discrete, and
space can be further optimized and confined towards differ- high-dimensional vector due to the nature of sparseness in
ent target problems. Realizing this idea leads to supervised or large-scale networks. Such a representation is not friendly to
pseudo supervised information (i.e. the advanced informa- subsequent applications. In the field of natural language
tion) in the target scenarios. Directly designing a framework processing, the word representation also suffers from similar
of representation learning for a particular target scenario is drawbacks. The development of Word2Vector [24] signifi-
also known as an end-to-end solution [18], where high- cantly improves the effectiveness of word representation by
quality supervised information is exploited to learn the latent transforming sparse, discrete and high-dimensional vectors
representation space from scratch. End-to-end solutions have into dense, continuous and low-dimensional vectors. The
demonstrated their advantages in some fields, such as com- intuition of Word2Vector is that a word vector should be able
puter vision [19] and natural language processing (NLP) [20]. to reconstruct the vectors of its neighborhood words which
Similar ideas are also feasible for network applications. Tak- are defined by co-occurence rate. Some methods in network
ing the network node classification problem as an example, if embedding borrow these ideas. The key problem is how to
we have the labels of some network nodes, we can design a define “neighborhood” in networks.
solution with network structure as input, node labels as To make analogy with Word2Vector, random walk mod-
supervised information, and embedding representation as els are exploited to generate random paths over a network.
latent middle layer, and the resulted network embedding By regarding a node as a word, we can regard a random path
is specific for node classification. Some recent works demon- as a sentence, and the node neighborhood can be identified
strate the feasibility in applications such as cascading predic- by co-occurence rate as in Word2Vector. Some representative
tion [18], anomaly detection [21], network alignment [22] and methods include DeepWalk [3] and Node2Vec [25].
collaboration prediction [23].
In general, network structures and properties are the fun- 2.2.3 Deep Neural Networks
damental factors that need to be considered in network By definition, network embedding is to transform the origi-
embedding. Meanwhile, side information on nodes and nal network space into a low-dimensional vector space. The
links, as well as advanced information from target problem intrinsic problem is to learn a mapping function between
is helpful to enable the learned network embedding work these two spaces. Some methods, like matrix factorization,
well in real applications. assume the mapping function to be linear. However, the for-
mation process of a network is complicated and highly non-
2.2 Commonly Used Models in Network Embedding linear, thus a linear function may not be adequate to map
To transform networks from original network space to the original network to an embedding space.
embedding space, different models can be adopted to incor- If seeking for an effective non-linear function learning
porate different types of information or address different model, deep neural networks are certainly useful options

Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
CUI ET AL.: A SURVEY ON NETWORK EMBEDDING 837
X X
because of their huge successes in other fields. The key chal- min kxi  Wij xj k2 ; (2)
W
lenges are how to make deep models fit network data, and i j
how to impose network structure and property-level con- where the weight Wij measures the contribution of the
straints on deep models. Some representative methods, entry xj to the reconstruction of entry xi . Finally, in the
such as SDNE [6], SDAE [26], and SiNE [13], propose deep low-dimensional space, LLE constructs a neighborhood-
learning models for network embedding to address these preserving mapping based on locally linear reconstruc-
challenges. At the same time, deep neural networks are also tion as follows:
well known for their advantages in providing end-to-end X X
solutions. Therefore, in the problems where advanced infor- min kui  Wij uj k2 : (3)
U
mation is available, it is natural to exploit deep models to i j
come up with an end-to-end network embedding solution. By optimizing the above function, the low-dimensional
For instance, some deep model based end-to-end solutions representation matrix U, which preserves the neighborhood
are proposed for cascade prediction [18] and network structure, can be obtained.
alignment [22]. Laplacian eigenmaps (LE) [30] also begins with con-
The network embedding models are not limited to those structing a graph using -neighborhoods or K nearest
mentioned in this subsection. Moreover, the three kinds of neighbors. Then the heat kernel [31] is utilized to choose the
models are not mutually exclusive, and their combinations weight Wij of nodes i and j in the graph. Finally, the repre-
are possible to make new solutions. More models and sentation ui of node i can be obtained by minimizing the fol-
details will be discussed in later sections. lowing function:
X
3 NETWORK EMBEDDING VERSUS GRAPH kui  uj k2 Wij ¼ trðUT LUÞ; (4)
EMBEDDING i;j

The goal of graph embedding is similar as network embed- where L ¼ D  W is the Laplacian matrix, and D is the
ding, that is, to embed a graph into a low-dimensional vec- P
diagonal matrix with Dii ¼ j Wji . In addition, the con-
tor space [27]. There is a rich literature in graph embedding. straint UT DU ¼ I is introduced to avoid trivial solutions.
Fu and Ma [9] provide a thorough review on the traditional Furthermore, the locality preserving projection (LPP) [32], a
graph embedding methods. Here we only present some rep- linear approximation of the nonlinear LE, is proposed. Also,
resentative and classical methods on graph embedding, it introduces a transformation matrix A such that the repre-
aiming to demonstrate the critical differences between sentation ui of entry xi is ui ¼ AT xi . LPP computes the
graph embedding and the current network embedding. transformation matrix A first, and finally the representation
ui can be obtained.
3.1 Representative Graph Embedding Methods These methods are extended in the rich literature of
Graph embedding methods are originally studied as dimen- graph embedding by considering different characteristics of
sion reduction techniques. A graph is usually constructed the constructed graphs [9].
from a feature represented data set, like image data set. Iso-
map [28] first constructs a neighborhood graph G using con- 3.2 Major Differences
nectivity algorithms such as K nearest neighbors (KNN), Network embedding and graph embedding have substantial
i.e., connecting data entries i and j if i is one of the K nearest differences in objective and assumptions. As mentioned
neighbors of j. Then based on G, the shortest path dG ij of before, network embedding has two goals, i.e. reconstructing
entries i and j in G can be computed. Consequently, for all original networks and support network inference. The objec-
the N data entries in the data set, we have the matrix of tive functions of graph embedding methods mainly target the
graph distances DG ¼ fdG ij g. Finally, the classical multidi- goal of graph reconstruction. As discussed before, the embed-
mensional scaling (MDS) method is applied to DG to obtain ding space learned for network reconstruction is not necessar-
the coordinate vector ui for entry i, which aims to minimize ily good for network inference. Therefore, graph embedding
the following function: can be regarded as a special case of network embedding, and
the recent research progress on network embedding pays
N X
X N
2 more attention to network inference.
ðdG
ij  kui  uj kÞ : (1)
i¼1 j¼1
Moreover, graph embedding mostly works on graphs con-
structed from feature represented data sets, where the prox-
Indeed, Isomap learns the representation ui of entry i, imity among nodes encoded by the edge weights are well
which approximately preserves the geodesic distances of defined in the original feature space. In contrast, network
the entry pairs in the low-dimensional space. embedding mostly works on naturally formed networks,
The key problem of Isomap is its high complexity due to such as social networks, biology networks, and e-commerce
the computing of pair-wise shortest pathes. Locally linear networks. In those networks, the proximities among nodes
embedding (LLE) [29] is proposed to eliminate the need to are not explicitly or directly defined. The definition of node
estimate the pairwise distances between widely separated proximities depends on specific analytic tasks and application
entries. LLE assumes that each entry and its neighbors lie scenarios. Therefore, we have to incorporate rich information,
on or close to a locally linear patch of a mainfold. To charac- such as network structures, properties, side information and
terize the local geometry, each entry can be reconstructed advanced information, in network embedding to facilitate dif-
from its neighbors as follows: ferent problems and applications.

Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
838 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 31, NO. 5, MAY 2019

Fig. 4. Overview of DeepWalk. Image extracted from [3].

In the rest of the paper, we mainly focus on the network representation learning model, is adopted by DeepWalk to
embedding methods with the goal of supporting network learn the representations of nodes. Specifically, as shown in
inference. Fig. 4, DeepWalk adopts a truncated random walk on a net-
work to generate a set of walk sequences. For each walk
4 STRUCTURE AND PROPERTY PRESERVING sequence s ¼ fv1 ; v2 ; :::; vs g, following Skip-Gram, Deep-
Walk aims to maximize the probability of the neighbors of
NETWORK EMBEDDING node vi in this walk sequence as follows:
In essence, one basic requirement of network embedding is
to appropriately preserve network structures and capture max Prðfviw ; :::; viþw gnvi jfðvi ÞÞ ¼ Piþw
j¼iw;j6¼i Prðvj jfðvi ÞÞ;
f
properties of networks. Often, network structures include (5)
first-order structure and higher-order structure, such as where w is the window size, fðvi Þ is the current representa-
second-order structure and community structure. Networks tion of vi and fviw ; :::; viþw gnvi is the local context nodes of
with different types have different properties. For example, vi . Finally, hierarchical soft-max [33] is used to efficiently
directed networks have the asymmetric transitivity prop- infer the embeddings.
erty. The structural balance theory is widely applicable to Node2vec [25] demonstrates that DeepWalk is not expres-
signed networks. sive enough to capture the diversity of connectivity patterns
In this section, we review the representative methods of in a network. Node2vec defines a flexible notion of a node’s
structure preserving network embedding and property pre- network neighborhood and designs a second order random
serving network embedding. walk strategy to sample the neighborhood nodes, which can
smoothly interpolate between breadth-first sampling (BFS)
4.1 Structure Preserving Network Embedding and depth-first sampling (DFS). Node2vec is able to learn the
Network structures can be categorized into different groups representations that embed nodes with same network com-
that present at different granularities. The commonly munity closely, and to learn representations where nodes
exploited network structures in network embedding include sharing similar roles have similar embeddings.
neighborhood structure, high-order node proximity and net- LINE [10] is proposed for large scale network embed-
work communities. ding, and can preserve the first and second order proxim-
DeepWalk [3] is proposed for learning the representations ities. The first order proximity is the observed pairwise
of nodes in a network, which is able to preserve the neighbor proximity between two nodes, such as the observed edge
structures of nodes. DeepWalk discovers that the distribu- between nodes 6 and 7 in Fig. 5. The second order proximity
tion of nodes appearing in short random walks is similar to is determined by the similarity of the “contexts” (neighbors)
the distribution of words in natural language. Motivated by of two nodes. For example, the second order similarity
this observation, Skip-Gram model [24], a widely used word between nodes 5 and 6 can be obtained by their neighbor-
hoods 1, 2, 3, and 4 in Fig. 5. Both the first order and second
order proximities are important in measuring the relation-
ships between two nodes. The first order proximity can be
measured by the joint probability distribution between two
nodes vi and vj as
1
p1 ðvi ; vj Þ ¼ : (6)
1 þ expðuTi uj Þ

The second order proximity is modeled by the probability of


the context node vj being generated by node vi , that is,

expð iÞ
uTj u
Fig. 5. An example of the first-order and second-order structures in a p2 ðvj jvi Þ ¼ P : (7)
k  i ÞÞ
expðu Tu
network. Image extracted from [10]. k

Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
CUI ET AL.: A SURVEY ON NETWORK EMBEDDING 839

model [36] to factorize the pairwise node similarity matrix


and learn the representations of nodes. Meanwhile, the
community structure is detected by modularity maximiza-
tion [37]. Then, based on the assumption that if the repre-
sentation of a node is similar to that of a community, the
node may have a high propensity to be in this community,
they introduce an auxiliary community representation
matrix to bridge the representations of nodes with the com-
munity structure. In this way, the learned representations of
nodes are constrained by both the microscopic structure
and community structure, which contains more structural
information and becomes more discriminative.
The aforementioned methods mainly adopt the shallow
Fig. 6. The framework of SDNE. Image extracted from [6]. models, consequently, the representation ability is limited.
SDNE [6] proposes a deep model for network embedding,
The conditional distribution implies that nodes with similar so as to address the high non-linearity, structure-preserv-
distributions over the contexts are similar to each other. By ing, and sparsity issues. The framework is shown in Fig. 6.
minimizing the KL-divergence of the two distributions and Specifically, SDNE uses the deep autoencoder with multiple
the empirical distributions respectively, the representations non-linear layers to preserve the neighbor structures of
of nodes that are able to preserve the first and second order nodes. Given the input adjacency nodes xi of node i, the hid-
proximities can be obtained. den representations for each layer can be obtained by
Considering that LINE only preserves the first-order and
ð1Þ
second-order proximities, GraRep [34] demonstrates that yi ¼ sðWð1Þ xi þ bð1Þ Þ
k-step (k > 2) proximities should also be captured when ðkÞ ðk1Þ
(9)
yi ¼ sðWðkÞ yi þ bðkÞ Þ; k ¼ 2; :::; K:
constructing the global representations of nodes. Given the
adjacency matrix A, the k-step probability transition matrix Then the output representation ^ xi can be obtained by revers-
can be computed by Ak ¼ A:::A k
|fflffl{zfflffl} , whose element Aij refers ing the calculation process of encoder. To impose more pen-
k alty to the reconstruction error of the non-zero elements
to the transition probability pk ðjjiÞ from a current node i to
than that of zero elements, SDNE introduces the penalty
a context node j and the transition consists of k steps. More-
vector bi ¼ fbij gnj¼1 (bij is larger than a threshold if there is
over, motivated by the Skip-Gram model [24], the k-step
an edge between nodes i and j) and gives rise to the follow-
loss function of node i is defined as
ing function that can preserve the second-order proximity
! X
X
Lk ðiÞ ¼ pk ðjjiÞlog sðuj ui Þ þ Ej0 pk ðV Þ ½log sðuTi uj0 Þ;
T L2nd ¼ xi  xi Þ  bi k2 :
kð^ (10)
j i

(8) To preserve the first-order proximity of nodes, the idea of


where sðxÞ ¼ ð1 þ ex Þ1 , pk ðV Þ is the distribution over the Laplacian eigenmaps [30] is adopted. By exploiting the first-
nodes in the network and j0 is the node obtained from negative order and second-order proximities jointly into the learning
sampling. Furthermore, GraRep reformulates the loss func- process, the representations of nodes can be finally obtained.
tion as the matrix factorization problem, for each k-step loss Cao et al. [26] propose a network embedding method to
function, SVD can be directly used to infer the representations capture the weighted graph structure and represent nodes
of nodes. By concentrating the representations learned from of non-linear structures. As shown in Fig. 7, instead of
each function, the global representations can be obtained. adopting the previous sampling strategy that needs to
Wang et al. [4] propose a modularized nonnegative determine certain hyper parameters, they considers a ran-
matrix factorization (M-NMF) model for network embed- dom surfing model motivated by the PageRank model.
ding, which aims to preserve both the microscopic struc- Based on this random surfing model, the representation of a
ture, i.e., the first-order and second-order proximities of node can be initiatively constructed by combining the
nodes, and the mesoscopic community structure [35]. To weighted transition probability matrix. After that, the PPMI
preserve the microscopic structure, they adopt the NMF matrix [38] can be computed. Finally, the stacked denoising

Fig. 7. Overview of the method proposed by Cao et al. [26]. Image extracted from [26].

Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
840 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 31, NO. 5, MAY 2019

autoencoders [39] that partially corrupt the input data


before taking the training step are applied to learn the latent
representations.
In order to make a general framework on network embed-
ding, Chen et al. [40] propose a network embedding frame-
work that unifies some of the previous algorithms, such as
LE, DeepWalk and Node2vec. The proposed framework,
denoted by GEM-D ½hðÞ; gðÞ; dð; Þ, involves three important
building blocks: hðÞ is a node proximity function based on
the adjacency matrix; gðÞ is a warping function that warps
the inner products of network embeddings; and dð; Þ meas-
ures the differences between h and g. Furthermore, they dem- Fig. 8. The framework of SiNE. Image extracted from [13].
onstrate that the high-order proximity for hðÞ and the
exponential function for gðÞ are more important for a network the final similarity between two nodes can be aggregated
embedding algorithm. Based on these observations, they pro- from fSijm gMm¼1 . Finally they approximate the aggregated
Q
pose UltimateWalk ¼ GEM-D ½ ðLÞ ; expðxÞ; dwf ð; Þ, where similarity to the semantic similarity based on the observa-
QðLÞ tion that if two nodes have a large semantic similarity, at
is a finite-step transition matrix, expðxÞ is an exponential
function and dwf ð; Þ is the warped Frobenius norm. least one of the similarities Sijm from the hash tables is large,
otherwise, all of the similarities are small.
The main goal of the aforementioned methods is to learn
Preserving the asymmetric transitivity property of
the embedding for nodes, currently there are some methods
directed network is considered by HOPE [8]. Asymmetric
proposed to learn the whole-graph embedding. For exam-
transitivity indicates that, if there is a directed edge from
ple, in [41], they define some connected and non-isomorphic
node i to node j and a directed edge from j to v, there is
induced sub-graphs and different networks consist of these
likely a directed edge from i to v, but not from v to i. In
sub-graphs. Then language modeling method, such as Skip-
order to measure this high-order proximity, HOPE summa-
Gram, can be applied to the edit-distance graph of these
rizes four measurements in a general formulation, that is,
sub-graphs. Therefore, the representations of these sub-
Katz Index [45], Rooted PageRank [7], Common Neigh-
graphs can be learned and the similarities of different net-
bors [7], and Adamic-Adar [46]. With the high-order prox-
works can be captured. PATCHY-SAN [42] is proposed to
imity, SVD can be directly applied to obtain the low
learn the embedding for a whole graph based on convolu-
dimensional representations. Furthermore, the general for-
tional neural network (CNN) [43], so as to deal with the
mulation of high-order proximity enables HOPE to trans-
whole graph related tasks. In order to make the traditional
form the original SVD problem into a generalized SVD
CNN compatible with the network data, they elaborately
problem [47], such that the time complexity of HOPE is
design several network data preprocessing steps, such as
largely reduced, which means HOPE is scalable for large
node sequence selection and graph normalization. In this
scale networks.
way, the network topology can be transformed to the forma-
SiNE [13] is proposed for signed network embedding,
tion for CNN.
which considers both positive and negative edges in a net-
In summary, many network embedding methods aim to
work. Due to the negative edges, the social theories on signed
preserve the local structure of a node, including neighbor-
network, such as structural balance theory [12], [48], are very
hood structure, high-order proximity as well as community
different from the unsigned network. The structural balance
structure, in the latent low-dimensional space. Both linear
theory demonstrates that users in a signed social network
and non-linear models are attempted, demonstrating the
should be able to have their “friends” closer than their
large potential of deep models in network embedding.
“foes”. In other words, given a triplet ðvi ; vj ; vk Þ with edges
eij ¼ 1 and eik ¼ 1, the similarity fðvi ; vj Þ between nodes vi
4.2 Property Preserving Network Embedding and vj is larger than fðvi ; vk Þ. To model the structural balance
Among the rich network properties, the properties that are phenomenon, a deep learning model consisting of two deep
crucial for network inference are the focus in property pre- networks with non-linear functions is designed to learn the
serving network embedding. Specifically, most of the exist- embeddings and preserve the network structure property,
ing property preserving network embedding methods focus which is consistent with the extended structural balance the-
on network transitivity in all types of networks and the ory. The framework is shown in Fig. 8.
structural balance property in signed networks. The methods reviewed in this subsection demonstrate
Ou et al. [44] aim to preserve the non-transitivity prop- the importance of maintaining network properties in net-
erty via latent similarity components. The non-transitivity work embedding space, especially the properties that
property declares that, for nodes A, B and C in a network largely affect the evolution and formation of networks. The
where ðA; BÞ and ðB; CÞ are similar pairs, ðA; CÞ may be a key challenge in is how to address the disparity and hetero-
dissimilar pair. For example, in a social network, a student geneity of the original network space and the embedding
may connect with his classmates and his family, while his vector space at property level.
classmates and family are probably very different. To
address this, they use a set of linear projection matrices to 4.3 Summary
extract M hash tables, and thus, each pair of nodes can have Generally, most of the structure and property preserving
M similarities fSijm gM
m¼1 based on those hash tables. Then methods take high order proximities of nodes into account,

Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
CUI ET AL.: A SURVEY ON NETWORK EMBEDDING 841

which demonstrate the importance of preserving high order


structures in network embedding. The difference is the
strategy of obtaining the high order structures. Some meth-
ods implicitly preserve high-order structure by assuming a
generative mechanism from a node to its neighbors, while
some other methods realize this by explicitly approximating
high-order proximities in the embedding space. As topol-
ogy structures are the most notable characteristic of net-
works, structure-preserving network methods embody a
large part of the literature. Comparatively, property pre-
serving network embedding is a relatively new research Fig. 9. The augmented network proposed by Sun et al. [52]. Image
topic and is only studied lightly. As network properties usu- extracted from [52].
ally drive the formation and evolution of networks, it shows
great potential for future research and applications. DeepWalk is equivalent to factorizing the matrix M whose
element Mij ¼ log ð½ei ðA þ A2 þ ::: þ At Þj =tÞ, where A is the
5 NETWORK EMBEDDING WITH SIDE INFORMATION adjacency matrix, t denotes the t steps in a random walk
and ei is a row vector where all entries are 0 except the i-th
Besides network structures, side information is another
entry is 1. Then, based on the DeepWalk-derived matrix fac-
important information source for network embedding. Side
torization and motivated by the inductive matrix comple-
information in the context of network embedding can be
tion [17], they incorporate rich text information T into
divided into two categories: node content and types of
network embedding as follows:
nodes and edges. In this section, we review the methods
that take side information into network embedding. 
min kM  WT HTk2F þ ðkWk2F þ kHk2F Þ: (12)
W;H 2
5.1 Network Embedding with Node Content
In some types of networks, like information networks, Finally, they concatenate the optimal W and HT as the rep-
nodes are acompanied with rich information, such as node resentations of nodes.
labels, attributes or even semantic descriptions. How to TADW suffers from high computational cost and the
combine them with the network topology in network node attributes just simply incorporated as unordered fea-
embedding arouses considerable research interests. tures lose the much semantic information. Sun et al. [52]
Tu et al. [14] propose a semi-supervised network embed- consider the content as a special kind of nodes, and give
ding algorithm, MMDW, by leveraging labeling information rise to an augmented network, as shown in Fig. 9. With this
of nodes. MMDW is also based on the DeepWalk-derived augmented network, they are able to model the node-node
matrix factorization. MMDW adopts support vector links and node-content links in the latent vector space. They
machines (SVM) [49] and incorporates the label information use a logistic function to model the relationship in the new
to find an optimal classifying boundary. By optimizing the augmented network, and by combining with negative sam-
max-margin classifier of SVM and matrix factorization pling, they can learn the representations of nodes in a joint
based DeepWalk simultaneously, the representations of objective function, such that the representations can pre-
nodes that have more discriminative ability can be learned. serve the network structure as well as the relationship
Le et al. [50] propose a generative model for document between the node and content.
network embedding, where the words associated with each Pan et al. [53] propose a coupled deep model that incorpo-
documents and the relationships between documents are rates network structure, node attributes and node labels into
both considered. For each node, they learn its low-rank network embedding. The architecture of the proposed model
representation ui in a low dimensional vector space, which is shown in Fig. 10. Consider a network with N nodes
can reconstruct the network structure. Also, they learn the fvi gi¼1;:::;N , where each node is associated with a set of words
representation of nodes in the topic space based on the Rela- fwi g, and some nodes may have jLj labels fci g. To exploit this
tional Topic Model (RTM) [51], where each topic z is associ- information, they aim to maximize the following function:
ated with a probability distribution over words. To integrate
the two aspects, they associate each topic z with a representa- N X
X X
tion ’z in the same low dimensional vector space and then L ¼ ð1  aÞ log P ðviþj jvi Þ
i¼1 s2S bjb;j6¼0
have the following function: (13)
X
N X jLj X
X
a log P ðwj jvi Þ þ a log P ðwj jci Þ;
expð 12 kui  ’z k2 Þ
P ðzjvi Þ ¼ P 1 2
: (11) i¼1 bjb i¼1 bjb
z expð 2 kui  ’z k Þ
where S is the random walks generated in the network and
Finally, in a unified generative process, the representations b is the window size of sequence. Specifically, function P ,
of nodes U can be learned. which captures the probability of observing contextual
Besides network structures, Yang et al. [15] propose nodes (or words) given the current node (or label), can be
TADW that takes the rich information (e.g., text) associated computed using the soft-max function. In Eq. 13, the first
with nodes into account when they learn the low dimen- term is also motivated by Skip-Gram, similar to DeepWalk,
sional representations of nodes. They first prove that which models the network structure. The second term

Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
842 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 31, NO. 5, MAY 2019

link-based and attribute-based representations, and utilize


the consensus to establish the relations between them.
Although different methods adopt different strategies to
integrate node content and network topology, they all
assume that node content provides additional proximity
information to constrain the representations of nodes.

5.2 Heterogeneous Information Network


Embedding
Different from networks with node content, heterogeneous
networks consist of different types of nodes and links. How to
unify the heterogeneous types of nodes and links in network
embedding is also an interesting and challenging problem.
Yann et al. [58] propose a heterogeneous social network
Fig. 10. The framework of TriDNR [53]. Image extracted from [53]. embedding algorithm for classifying nodes. They learn the
representations of all types of nodes in a common vector
models the node-content correlations and the third term space, and perform the inference in this space. In particular,
models the label-node correspondences. As a result, the for the node ui with type ti , they utilize a linear classifica-
learned representations is enhanced by network structure, t
tion function fui to predict its label and adopt the hinge-loss
node content, and node labels. function D to measure the loss with the true label yi :
LANE [54] is also proposed to incorporate the label infor-
mation into the attributed network embedding. Unlike the X
l
t
previous network embedding methods, LANE is mainly Dðfui ðui Þ; yi Þ; (14)
based on spectral techniques [55]. LANE adopts the cosine i¼1
similarity to construct the corresponding affinity matrices
where l is the number of labeled nodes. To preserve the
of the node attributes, network structure, and labels. Then,
local structures in the latent space, they impose the follow-
based on the corresponding Laplacian matrices, LANE is
ing smoothness constraint, which enforces that two nodes i
able to map the three different sources into different latent
and j will be close in the latent space if they have a large
representations, respectively. In order to build the relation-
weight Wij in the heterogeneous network:
ship among those three representations, LANE projects all
these latent representations into a new common space by X
Wij kui  uj k2 : (15)
leveraging the variance of the projected matrix as the corre-
i;j
lation metric. The learned representations of nodes are able
to capture the structure proximities as well as the correla- In this way, different types of nodes are mapped into a com-
tions in the label informed attributed network. mon latent space. The overall loss function combines the
Huang et al. [56] pay more attentions on the scalability of classification and regularization losses Eqs. (14) and (15). A
attributed network embedding. The proposed method, stochastic gradient descent method is used here to learn the
named AANE, is based on the decomposition of attribute representations of nodes in a heterogeneous network for
affinity matrix and the penalty of embedding difference classifying.
between linked nodes. AANE provides a distributed opti- Chang et al. [16] propose a deep embedding algorithm
mization algorithm to process each node efficiently. for heterogeneous networks, whose nodes have various
Wei et al. [57] study the problem of cross view link pre- types. The main goal of the heterogeneous network embed-
diction (CVLP) based on attributed network embedding, ding is to learn the representations of nodes with different
i.e., to recommend nodes with only links to nodes with only types such that the heterogeneous network structure can be
attributes (or vice versa). The proposed model learns the well preserved. As shown in Fig. 11, given a heterogeneous

Fig. 11. Overview of the method proposed by Chang et al. [16]. Image extracted from [16].

Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
CUI ET AL.: A SURVEY ON NETWORK EMBEDDING 843

network with two types of data (e.g., images and texts), continuous space. Specifically, the diffusion kernel in a
there are three types of edges, i.e., image-image, text-text, d-dimensional Euclidean space is defined as
and image-text. The nonlinear embeddings of images and
kjik2
texts are learned by a CNN model and the fully connected Kðt; j; iÞ ¼ ð4PtÞ2 e
d
4t : (16)
layers, respectively. By cascading the extra linear embed-
ding layer, the representations of images and texts can be It models the heat at location i at time t when an initial unit
mapped to a common space. In the common space, the heat is positioned at location j, which also models how
similarities between data from different modalities can be information spreads between nodes in a network.
directly measured, so that if there is an edge in the original The goal of the proposed algorithm is to learn the repre-
heterogeneous network, the pair of data has similar sentations of nodes in the latent space such that the diffu-
representations. sion kernel can best explain the cascades in the training set.
Huang and Mamoulis [59] propose a meta path similarity Given the representation uj of the initial contaminated node
preserving heterogeneous information network embedding j in cascade c, the contamination score of node i can be com-
algorithm. To model a particular relationship, a meta path puted by
[60] is a sequence of object types with edge types in between.
They develop a fast dynamic programming approach to cal- d kuj ui k2

culate the truncated meta path based proximities, whose time Kðt; j; iÞ ¼ ð4PtÞ2 e 4t : (17)
complexity is linear to the size of the network. They adopt a The intuition of Eq. (17) is that the closer a node in the latent
similar strategy as LINE [10] to preserve the proximity in the space is from the source node, the sooner it is infected by
low dimensional space. information from the source node. As the cascade c offers a
Xu et al. [61] propose a network embedding method for guidance for the information diffusion of nodes, we expect
coupled heterogeneous network. The coupled heteroge- the contamination score to be as closely consistent with c as
neous network consists of two different but related homoge- possible, which gives rise to the following empirical risk
neous networks. For each homogeneous network, they function:
adopt the same function (Eq. (6)) as LINE to model the rela-
tionships between nodes. Then the harmonious embedding X
LðUÞ ¼ DðKð:; j; :Þ; cÞ; (18)
matrix is introduced to measure the closeness between c
nodes of different networks. Because the inter-network
edges are able to provide the complementary information in where function D is a measure of the difference between the
the presence of intra-network edges, the learned embed- predicted score and the observed diffusion in c. By minimiz-
dings of nodes also perform well on several tasks. ing the Eq. (18) and reformulating it as a ranking problem,
the optimal representations U of nodes can be obtained.
5.3 Summary The cascade prediction problem here is defined as pre-
In the methods preserving side information, side informa- dicting the increment of cascade size after a given time
tion introduces additional proximity measures so that the interval [18]. Li et al. [18] argue that the previous work on
relationships between nodes can be learned more compre- cascade prediction all depends on the bag of hand-crafting
hensively. Their difference is the way of integrating net- features to represent the cascade and network structures.
work structures and side information. Many of them are Instead, they present an end-to-end deep learning model to
naturally extensions from structure preserving network solve this problem using the idea of network embedding, as
embedding methods. illustrated in Fig. 12. Similar to DeepWalk [3], they perform
a random walk over a cascade graph to sample a set of
paths. Then the Gated Recurrent Unite (GRU) [64], a specific
6 ADVANCED INFORMATION PRESERVING
type of recurrent neural network [65], is applied to these
NETWORK EMBEDDING paths and learn the embeddings for these paths. The atten-
In this section, we review network embedding methods that tion mechanism is then used to assemble these embeddings
take additional advanced information into account so as to to learn the representation of this cascade graph. Once the
solve some specific analytic tasks. Different from side infor- representation of this cascade is known, a multi-layer per-
mation, the advanced information refers to the supervised ceptron [66] can be adopted to output the final predicted
or pseudo supervised information in a specific task. size of this cascade. The whole procedure is able to learn the
representation of cascade graph in an end-to-end manner.
6.1 Information Diffusion The experimental results on the Twitter and Aminer net-
Information diffusion [62] is an ubiquitous phenomenon on works show promising performance on this task.
the web, especially in social networks. Many real applica-
tions, such as marketing, public opinion formation, epidem- 6.2 Anomaly Detection
ics, are related to information diffusion. Most of the Anomaly detection has been widely investigated in previ-
previous studies on information diffusion are conducted in ous work [67]. Anomaly detection in networks aims to infer
original network spaces. the structural inconsistencies, which means the anomalous
Recently, Simon et al. [63] propose a social network embed- nodes that connect to various diverse influential communi-
ding algorithm for predicting information diffusion. The basic ties [21], [68], such as the red node in Fig. 13. Hu et al. [21]
idea is to map the observed information diffusion process into propose a network embedding based method for anomaly
a heat diffusion process modeled by a diffusion kernel in the detection. In particular, in the proposed model, the kth

Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
844 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 31, NO. 5, MAY 2019

Fig. 12. The end-to-end pipeline of DeepCas proposed by Li et al. [18]. Image extracted from [18].

element uki in the embedding ui of node i represents the cor- First, Man et al. [22] extend the original sparse networks
relation between node i and community k. Then, they Gs and Gt to the denser networks. The basic idea is that
assume that the community memberships of two linked given a pair of users with anchor links, if they have a con-
nodes should be similar. Therefore, they can minimize the nection in one network, so do their counterparts in the other
following objective function: network [69], in this way, more links will be added to the
X X original networks. For a pair of nodes i and j whose repre-
L¼ kui  uj k2 þ a ðkui  uj k  1Þ2 : (19) sentations are ui and uj , respectively, by combining the neg-
ði;jÞ2E ði;jÞ 2
=E ative sampling strategy, they use the following function to
preserve the structures of Gs and Gt in a vector space:
This optimization problem can be solved by the gradient
descent method. By taking the neighbors of a node into X
K
account, the embedding of the node can be obtained by a log sðuTi uj Þ þ Evk /Pn ðvÞ ½log ð1  sðuTi uk ÞÞ; (20)
weighted sum of the embeddings of all its neighbors. An k¼1
anomaly node in this context is one connecting to a set of
different communities. Since the learned embedding of where sðxÞ ¼ 1=ð1 þ expðxÞÞ. The first term models the
nodes captures the correlations between nodes and commu- observed edges, and the second term samples K negative
nities, based on the embedding, they propose a new mea- edges.
sure to indicate the anomalousness level of a node. The Then given the observed anchor links ðvsi ; utj Þ 2 T and the
larger the value of the measure, the higher the propensity representations ui and uj , they aim to learn a mapping
for a node being an anomaly node.

6.3 Network Alignment


The goal of network alignment is to establish the corres-
pondence between the nodes from two networks.
Man et al. [22] propose a network embedding algorithm to
predict the anchor links across social networks. The same
users who are shared by different social networks naturally
form the anchor links, and these links bridge the different
networks. As illustrated in Fig. 14, the anchor link predic-
tion problem is, given source network Gs and target net-
work Gt and a set of observed anchor links T , to identify the
hidden anchor links across Gs and Gt .

Fig. 13. The anomalous (red) nodes in embedding, and A, B, C, and D Fig. 14. The illustrative diagram of network embedding for anchor link
are four communities [21]. Image extracted from [21]. prediction proposed by Man et al. [22]. Image extracted from [22].

Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
CUI ET AL.: A SURVEY ON NETWORK EMBEDDING 845

TABLE 1
A Summary of Different Types of Network Embedding Methods

Categorization Technique
Method
Network Side Advanced Matrix Random Deep neural others
alone information information factorization walk network
p p
DeepWalk [3] p p
LINE [10] p p
GraRep [34] p p
SDNE [6] p p
Node2vec [25] p p
M-NMF [4] p p
Cao et al. [26] p p
Chen et al. [40] p p
HOPE [8] p p
SiNE [13] p p
Ou et al. [44] p p
MMDW [14] p p
Le et al. [50] p p
TADW [50] p p
Sun et al. [52] p p
Pan et al. [53] p p
LANE [54] p p
Yann et al. [58] p p
Chang et al. [16] p p
Huang et al. [59] p p
Xu et al. [61] p p
Simon et al. [63] p p
Li et al. [18] p p
Hu et al. [21] p p
Man et al. [22]

function f parameterized by u so as to bridge these two rep- structure so as to learn the representations of nodes. The
resentations. The loss function is defined as: other is to establish the connection between the representa-
tions of nodes and the target task. The first one is similar to
kfðui ; uÞ  uj kF : (21) structure and property preserving network embedding,
while the second one usually needs to consider the domain
The mapping function can be linear or non-linear via Multi- knowledge of a specific task. The domain knowledge
Layer Perceptron (MLP) [66]. By optimizing Eq. (20) and encoded by the advanced information makes it possible to
Eq. (21) simultaneously, the representations that can pre- develop end-to-end solutions for network applications.
serve the network structure and respect the observed Compared with the hand-crafted network features, such as
anchor links can be learned. numerous network centrality measures, the combination of
advanced information and network embedding techniques
6.4 Summary enables representation learning for networks. Many network
Advanced information preserving network embedding usu- applications may be benefitted from this new paradigm.
ally consists of two parts. One is to preserve the network All the aforementioned network embedding methods are
summarized in Table 1. Because some works lack of the
TABLE 2 technique details, here we mainly summarize the time com-
The Complexity of Some Representative Network plexity of some representative network embedding meth-
Embedding Methods ods, shown in Table 2. In particular, K is the representation
dimension, and N and jEj are the number of nodes and
Method Time complexity edges, respectively. I is the number of iterations. L is the
DeepWalk [3] OðKNlog NÞ walk length and w is the number of random-walk trials. e is
LINE [10] OðKjEjÞ the number of epochs, and S is the number of sampled
GraRep [34] OðNjEj þ KN 2 Þ
training triplets and J is a constant related with the struc-
SDNE [6] OðKIjEjÞ
Node2vec [25] OðKNÞ ture of deep neural network. m is the length of node attrib-
M-NMF [4] OðKIN 2 Þ utes. nzðXÞ is the number of non-zeros in matrix X. As we
Chen et al. [40] OðKmaxðjEj; NLwÞÞ can see, the time complexity of some network embedding
HOPE [8] OðK 2 IjEjÞ methods, such as DeepWalk, LINE, and Node2vec, is linear
SiNE [13] OðeNSJÞ with respect to the number of nodes N. Therefore, these
TADW [50] OðNjEj þ KIjEj þ KmIN þ K 2 INÞ methods are usually scalable for large networks. While,
TriDNR [53] OðK nzðXÞlog ðmÞ þ KNlog ðNÞÞ
some works, such as GraRep, M-NMF, and LANE, have the
LANE [54] OðmN 2 þ KIN 2 Þ
Yann et al. [58] OðjEjÞ quadratic complexity, which may cause the scalability issue
and limit them to be applied to the large networks.

Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
846 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 31, NO. 5, MAY 2019

TABLE 3
A Summary of Real World Networks

structure and property network embedding


networks preserving network with side classification link prediction clustering visualization
embedding information
p p p p p
BLOGCATALOG p p p p p
FLICKR p p p p p
YOUTUBE p p
Twitter p p p p p p
DBLP p p p p p p
Cora p p p p p p
Citeseer p p
ArXiv p p p p p
Wikipedia p p
PPI

7 NETWORK EMBEDDING IN PRACTICE 7.1.1 Social Networks


In this section, we summarize the data sets, benchmarks,  BLOGCATALOG [70]. This is a network of social rela-
and evaluation tasks that are commonly used in developing tionships of the bloggers listed on the BlogCatalog web-
new network embedding methods. site. One instance of this data set can be found at http://
socialcomputing.asu.edu/datasets/BlogCatalog3.
7.1 Real World Data Sets  FLICKR [70]. This is a network of the contacts
Getting real network data sets in academic research is between users of the photo sharing websites Flickr.
always far from trivial. Here, we describe some most popu- One instance of the network can be downloaded at
lar real world networks currently used in network embed- http://socialcomputing.asu.edu/datasets/Flickr.
ding literature. The data sets can be roughly divided into  YOUTUBE [71]. This is a network between users of
four groups according to the nature of the networks: social the popular video sharing website, Youtube. One
networks, citation networks, language networks, and bio- instance of the network can be found at http://
logical networks. A summary of these data sets can be socialcomputing.asu.edu/datasets/YouTube2.
found in Table 3. Please note that, the same name may be  Twitter [72]. This is a network between users on a
used to refer to different variants in different studies. Here social news website Twitter. One instance of the net-
we aim to provide an overview of the networks, and do not work can be downloaded at http://socialcomputing.
attempt to describe all of those variants in detail. asu.edu/datasets/Twitter.

TABLE 4
A Summary of the Source Code

Structure and property preserving network embedding


Methods Source code
DeepWalk [3] https://github.com/phanein/deepwalk
LINE [10] https://github.com/tangjianpku/LINE
GraRep [34] https://github.com/ShelsonCao/GraRep
SDNE [6] http://nrl.thumedia.org/structural-deep-network-embedding
Node2vec [25] https://github.com/aditya-grover/node2vec
DNGR [26] https://github.com/ShelsonCao/DNGR
M-NMF [4] http://nrl.thumedia.org/community-preserving-network-embedding
GED [40] https://users.ece.cmu.edu/sihengc/publications.html
Ou et al. [44] http://nrl.thumedia.org/non-transitive-hashing-with-latent-similarity-components
HOPE [8] http://nrl.thumedia.org/asymmetric-transitivity-preserving-graph-embedding
Network embedding with side information
Methods Source code
MMDW [14] https://github.com/thunlp/mmdw
TADW [15] https://github.com/thunlp/tadw
TriDNR [53] https://github.com/shiruipan/TriDNR
Advanced information preserving network embedding
Methods Source code
Information diffusion [63] https://github.com/ludc/social_network_diffusion_embeddings
Cascade prediction [18] https://github.com/chengli-um/DeepCas
Anomaly detection [21] https://github.com/hurenjun/EmbeddingAnomalyDetection
Collaboration prediction [23] https://github.com/chentingpc/GuidedHeteEmbedding

Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
CUI ET AL.: A SURVEY ON NETWORK EMBEDDING 847

7.1.2 Citation Networks has been assessed in [3], [6], [10], [54]. Some studies [3], [6],
[10] apply their algorithms to the Youtube network, which
 DBLP [73]. This network represents the citation rela- also achieves promising classification results. A citation net-
tionships between authors and papers. One instance work usually represents the citation relationships between
of the data set can be found at http://arnetminer. authors or between papers. For example, [10], [53] use the
org/citation. DBLP network to test the classification performance. Cora is
 Cora [74]. This network represents the citation rela- used in [14], [15]. Citeseer is used in [14], [15], [53]. The classi-
tionships between scientific publications. Besides the fication performance on language networks, such as Wikipe-
link information, each publication is also associated dia, is also widely studied [10], [14], [15], [25]. The Protein-
with a word vector indicating the absence/presence Protein Interactions (PPI) is used in [25]. Based on NUS-
of the corresponding words from the dictionary. One WIDE [80], a heterogeneous network extracted from Flickr,
instance of the data set can be found at https:// Chang et al. [16] validated the superior classification perfor-
linqs.soe.ucsc.edu/node/236. mance of network embedding on heterogeneous networks.
 Citeseer [74]. This network, similar to Cora, also consists To summarize, network embedding algorithms have
of scientific publications and their citation relation- been widely used on various networks and have been well
ships. One instance of the data set can be downloaded demonstrated their effectiveness on node classification.
at https://linqs.soe.ucsc.edu/node/236.
 ArXiv [75], [76]. This is the collaboration network 7.3 Link Prediction
constructed from the ArXiv website. One instance of
Link prediction, as one of the most fundamental problems
the data set can be found at http://snap.stanford.
on network analysis, has received a considerable amount of
edu/data/ca-AstroPh.html.
attention [7], [82]. It aims to estimate the likelihood of the
7.1.3 Language Networks existence of an edge between two nodes based on observed
network structure [83]. Since network embedding algo-
 Wikipedia [77]. This is a word co-occurrence network rithms are able to learn the vector based features for each
from the English Wikipedia pages. One instance of the node, the similarity between nodes can be easily estimated,
data set can be found at http://www.mattmahoney. for example, by the inner product or the cosine similarity.
net/dc/textdata. A larger similarity implies that the two nodes may have a
higher propensity to be linked. Generally, precision @k and
7.1.4 Biological Networks Mean Average Precision (MAP) are used to evaluate the
link prediction performance [6].
 PPI [78]. This is a subgraph of the biological network The popularly used real networks for the link prediction
that represents the pairwise physical interactions task can be divided into three categories: citation networks
between proteins in yeast. One instance of the data set (ARXIV [75], [76] and DBLP1), social networks (SN-TWeibo2,
can be downloaded at http://konect.uni-koblenz.de/ SN-Twitter [72], Facebook [76], Epinions3, and Slashdot4),
networks/maayan-vidal. and biological networks (PPI [78]). Specifically, [6] and [25]
test the effectiveness on ARXIV5. HOPE [8] applies network
7.2 Node Classification embedding to link prediction on two directed networks SN-
Given some nodes with known labels in a network, the node Twitter, which is a subnetwork of Twitter6, and SN-TWeibo,
classification problem is to classify the rest nodes into differ- which is a subnetwork of the social network in Tencent
ent classes. Node classification is one of most primary appli- Weibo7. Node2vec [25] tests the performance of link predic-
cations for network embedding [3], [10]. Essentially, node tion on a social network Facebook and a biological network
classification based on network embedding for can be PPI. EOE [61] uses DBLP to demonstrate the effectiveness on
divided into three steps. First, a network embedding algo- citation networks. Based on two social networks, Epinions
rithm is applied to embed the network into a low dimen- and Slashdot, SiNE [13] shows the superior performance of
sional space. Then, the nodes with known labels are used as signed network embedding on link prediction.
the training set. Last, a classifier, such as Liblinear [79], is To sum up, network embedding is able to capture inher-
learned from the training set. Using the trained classifier, ent network structures, and thus naturally it is suitable for
we can infer the labels of the rest nodes. The popularly used link prediction applications. Extensive experiments on vari-
evaluation metrics for multi-label classification problem ous networks have demonstrated that network embedding
include Micro-F1 and Macro-F1 [70]. can tackle link prediction effectively.
The multi-label classification application has been suc-
cessfully tested on four categories of data sets, namely social 7.4 Node Clustering
networks (BLOGCATALOG [70], FLICKR [70], and YOU-
Node clustering is to divide the nodes in a network into
TUBE [71]), citation networks (DBLP [73], Cora [74], and
clusters such that the nodes within the same cluster are
Citeseer [74]), language networks (Wikipedia [77]), and bio-
logical networks (PPI [78]).
1. http://dblp.uni-trier.de/
Specifically, a social network usually is a communication 2. http://www.kddcup2012.org/c/kddcup2012-track1/data
network among users on online platforms. DeepWalk [3], 3. http://www.epinions.com/
GraRep [34], SDNE [6], node2vec [25], and LANE [54] con- 4. http://slashdot.org/
5. https://arxiv.org/
duct classification on BLOGCATALOG to evaluate the per- 6. https://twitter.com/
formance. Also, the classification performance on FLICKR 7. http://t.qq.com/

Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
848 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 31, NO. 5, MAY 2019

Fig. 15. Network visualization of 20-NewsGroup by different network embedding algorithms, i.e., SDNE [6], LINE [10], DeepWalk [3], GraRep [34],
and LE [81]. Image extracted from SDNE [6].

more similar to each other than the nodes in different clus- 7.5 Network Visualization
ters. Network embedding algorithms learn representations Another important application of network embedding is
of nodes in low dimensional vector spaces, so many network visualization, that is, generating meaningful visu-
typical clustering methods, such as Kmeans [84], can be alization that layouts a network on a two dimensional
directly adopted to cluster nodes based on their learned space. By applying the visualization tool, such as t-
representations. SNE [90], to the learned low dimensional representations of
Many evaluation criteria have been proposed for cluster- nodes, it is easy for users to see a big picture of a sophisti-
ing evaluation. Accuracy (AC) and normalized mutual cated network so that the community structure or node cen-
information (NMI) [85] are frequently used to assess the trality can be easily revealed.
clustering performance on graphs and networks. More often than not, the quality of network visualization
The node clustering performance is tested on three types by different network embedding algorithms is evaluated
of networks: social networks (e.g., Facebook [86] and visually. Fig. 15 is an example by SDNE [6] where SDNE is
YELP [59]), citation networks (e.g., DBLP [60]), and docu- applied to 20-NewsGroup. In Fig. 15, each document is
ment networks (e.g., 20-NewsGroup [87]). In particular, [16] mapped into a two dimensional space as a point, and differ-
extracts a social network from a social blogging site. It uses ent colors on the points represent the labels. As can be seen,
the TF-IDF features extracted from the blogs as the features network embedding preserves the intrinsic structure of
of blog users and the “following” behaviors to construct the the network, where similar nodes are closer to each other
linkages. It successfully applies network embedding to the than dissimilar nodes in the low-dimensional space. Also,
node clustering task. [4] uses the Facebook social network LINE [10], GraRep [34], and EOE [61] are applied to a cita-
to demonstrate the effectiveness of community preserving tion network DBLP and generate meaningful layout of the
network embedding on node clustering. [59] is applied network. Pan et al. [53] show the visualization of another
to more social networks including MOVIE, a network citation network Citeseer-M10 [91] consisting of scientific
extracted from YAGO [88] that contains knowledge about publications from ten distinct research areas.
movies, YELP, a network extracted from YELP that is about
reviews given to restaurants, and GAME, extracted from 7.6 Open Source Software
Freebase [89] that is related to video games. [26] tests In Table 4, we provide a collection of links where one can find
the node clustering performance on a document network, the source code of various network embedding methods.
20-NewsGroup network, which consists of documents.
The node clustering performance on citation networks is
8 CONCLUSIONS AND FUTURE RESEARCH
tested [59] by clustering authors in DBLP. The results show
the superior clustering performance on citation networks. DIRECTIONS
In summary, node clustering based on network embed- The above survey of the state-of-the-art network embedding
ding is tested on different types of networks. Network algorithms clearly shows that it is still a young and promis-
embedding has become an effective method to solve the ing research field. To apply network embedding to tackle
node clustering problem. practical applications, a frontmost question is to select the
appropriate methods. In Fig. 16 we show the relationship
among different types of network embedding methods dis-
cussed in this survey.
The structure and property preserving network embed-
ding is the foundation. If one cannot preserve well the net-
work structure and retain the important network
properties, in the embedding space serious information is
loss, which hurts the analytic tasks in sequel. Based on the
structure and property preserving network embedding, one
may apply the off-the-shelf machine learning methods. If
some side information is available, it can be incorporated
into network embedding. Furthermore, the domain knowl-
edge of some certain applications as advanced information
can be considered.
Fig. 16. Relationship among different types of network embedding In the rest of this section, we discuss several interesting
methods. directions for future work.

Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
CUI ET AL.: A SURVEY ON NETWORK EMBEDDING 849

8.1 More Structures and Properties rumors in social network [93], [94]? Can we use network
Although various methods are proposed to preserve struc- embedding to infer social ties [95]? Each real world applica-
tures and properties, such as first order and high order prox- tion has its own characteristics, and incorporating their
imities, communities, asymmetric transitivity, and structural unique domain knowledge into network embedding is a key.
balance, due to the complexity of real world networks, there The technical challenges here are how to model the specific
are still some particular structures that are not fully consid- domain knowledge as advanced information that can be inte-
ered in the existing network embedding methods. For exam- grated into network embedding in an effective manner.
ple, how to incorporate network motifs [92], one of the most
common higher-order structures in a network, into network 8.4 Dynamic Network Embedding
embedding remains an open problem. Also, more complex Although many network embedding methods are proposed,
local structures of a node can be considered to provide they are mainly designed for static networks. However, in
higher level constraints. The current assumption of network real world applications, it is well recognized that many net-
embedding is usually based on the pairwise structure, that works are evolving over time. For example, in the Facebook
is, if two nodes have a link, then their representations are network, friendships between users always dynamically
similar. This assumption can work well for some applica- change over time, e.g., new edges are continuously added to
tions, such as link prediction, but it cannot encode the the social network while some edges may be deleted. To
centrality information of nodes, because the centrality of a learn the representations of nodes in a dynamic network, the
node is usually related to a more complex structure. As existing network embedding methods have to be run repeat-
another example, in several real world applications, an edge edly for each time stamp, which is very time consuming and
may involve more than two nodes, known as a hyperedge. may not meet the realtime processing demand. Most of the
Such a hypernetwork naturally indicates richer relationships existing network embedding methods cannot be directly
among nodes and has its own characteristics. Hypernetwork applied to large scale evolving networks. New network
embedding is important for some real applications. embedding algorithms, which are able to tackle the dynamic
The power law distribution property indicates that most nature of evolving networks, are highly desirable.
nodes in a network are associated with a small number of
edges. Consequently, it is hard to learn an effective repre-
sentation for a node with limited information. How this 8.5 More Embedding Spaces
property affects the performance of network embedding The existing network embedding methods embed a net-
and how to improve the embeddings of the minority nodes work into the Euclidean space. In general, the principle of
are still largely untouched. network embedding can be extended to other target spaces.
For example, recently some studies [96] assume that the
8.2 The Effect of Side Information underlying structure of a network is in the hyperbolic space.
Section 5 discusses a series of network embedding algo- Under this assumption, heterogeneous degree distributions
rithms that preserve side information in embedding. All the and strong clustering emerge naturally, as they are the sim-
existing methods assume that there is an agreement between ple reflections of the negative curvature and metric property
network structure and side information. To what extent the of the underlying hyperbolic geometry. Exploring other
assumption holds in real applications, however, remains an embedding space is another interesting research direction.
open question. The low correlation of side information and
structures may degrade the performance of network embed- ACKNOWLEDGMENTS
ding. Moreover, it is interesting to explore the complemen-
This work was supported in part by the National Program on
tarity between network structures and side information.
Key Basic Research Project (No. 2015CB352300), National
More often than not, each information may contain some
Natural Science Foundation of China (No. U1611461,
knowledge that other information does not have.
No. 61772304, No. 61521002, No. 61531006, No. 61702296), the
Besides, in a heterogeneous information network, to
research fund of the Tsinghua-Tencent Joint Laboratory for
measure the relevance of two objects, the meta path, a
Internet Innovation Technology, and the Young Elite Scientist
sequence of object types with edge types in between, has
Sponsorship Program by CAST. The work of Jian Pei has been
been widely used. However, meta structure [88], which is
supported in part by the NSERC Discovery Grant program,
essentially a directed acyclic graph of object and edge types,
the Canada Research Chair program, and the NSERC Strate-
provides a higher-order structure constraint. This suggests
gic Grant program. All opinions, findings, conclusions and
a huge potential direction for improving heterogeneous
recommendations in this paper are those of the authors and
information network embedding.
do not necessarily reflect the views of the funding agencies.
8.3 More Advanced Information and Tasks
In general, most of network embedding algorithms are
REFERENCES
designed for general purposes, such as link prediction and [1] C. L. Staudt, A. Sazonovs, and H. Meyerhenke, “Networkit: A tool
suite for large-scale network analysis,” Netw. Sci., Cambridge Uni-
node classification. These network embedding methods versity Press, vol. 4, pp. 508–530, 2016.
mainly focus on general network structures and may not be [2] P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and
specific to some target applications. Another important T. Eliassi-Rad, “Collective classification in network data,” AI
research direction is to explore the possibility of designing Mag., vol. 29, no. 3, 2008, Art. no. 93.
[3] B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Online learning
network embedding for more specific applications. For exam- of social representations,” in Proc. 20th ACM SIGKDD Int. Conf.
ple, whether network embedding is a new way to detect Knowl. Discovery Data Mining, 2014, pp. 701–710.

Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
850 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 31, NO. 5, MAY 2019

[4] X. Wang, P. Cui, J. Wang, J. Pei, W. Zhu, and S. Yang, [28] J. B. Tenenbaum, V. De Silva, and J. C. Langford, “A global geo-
“Community preserving network embedding,” in Proc. 31th AAAI metric framework for nonlinear dimensionality reduction,” Sci.,
Conf. Artif. Intell., AAAI Press, 2017, pp. 203–209. vol. 290, no. 5500, pp. 2319–2323, 2000.
[5] I. Herman, G. Melançon, and M. S. Marshall, “Graph visualization [29] S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction
and navigation in information visualization: A survey,” IEEE by locally linear embedding,” Sci., vol. 290, no. 5500, pp. 2323–
Trans. Vis. Comput. Graph., vol. 6, no. 1, pp. 24–43, Jan.-Mar. 2000. 2326, 2000.
[6] D. Wang, P. Cui, and W. Zhu, “Structural deep network [30] M. Belkin and P. Niyogi, “Laplacian eigenmaps and spectral tech-
embedding,” in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discov- niques for embedding and clustering,” in Proc. Adv. Neural Inf.
ery Data Mining, 2016, pp. 1225–1234. Process. Syst., 2002, pp. 585–591.
[7] D. Liben-Nowell and J. Kleinberg, “The link-prediction problem [31] N. Berline, E. Getzler, and M. Vergne, Heat Kernels and Dirac Oper-
for social networks,” J. Assoc. Inf. Sci. Technol., vol. 58, no. 7, ators. New York, NY, USA: Springer Science & Business Media,
pp. 1019–1031, 2007. 2003.
[8] M. Ou, P. Cui, J. Pei, Z. Zhang, and W. Zhu, “Asymmetric transi- [32] X. He and P. Niyogi, “Locality preserving projections,” in Proc.
tivity preserving graph embedding,” in Proc. 22nd ACM SIGKDD Adv. Neural Inf. Process. Syst., 2004, pp. 153–160.
Int. Conf. Knowl. Discovery Data Mining, 2016, pp. 672–681. [33] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estima-
[9] Y. Fu and Y. Ma, Graph Embedding for Pattern Analysis. New York, tion of word representations in vector space,” in Proc. ICLR Work-
NY, USA: Springer Science & Business Media, 2012. shops Track, 2013.
[10] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “Line: [34] S. Cao, W. Lu, and Q. Xu, “Grarep: Learning graph representa-
Large-scale information network embedding,” in Proc. 24th Int. tions with global structural information,” in Proc. 24th ACM Int.
Conf. World Wide Web, 2015, pp. 1067–1077. Conf. Inf. Knowl. Manage, 2015, pp. 891–900.
[11] H. Huang, J. Tang, S. Wu, L. Liu, et al., “Mining triadic closure [35] M. Girvan and M. E. Newman, “Community structure in social
patterns in social networks,” in Proc. 23rd Int. Conf. World Wide and biological networks,” Proc. Nat. Academy Sci. USA, vol. 99,
Web, 2014, pp. 499–504. no. 12, pp. 7821–7826, 2002.
[12] D. Cartwright and F. Harary, “Structural balance: A generaliza- [36] D. D. Lee and H. S. Seung, “Algorithms for non-negative matrix
tion of heider’s theory,” Psychological Rev., vol. 63, no. 5, 1956, factorization,” in Proc. Adv. Neural Inf. Process. Syst., 2001, pp. 556–
Art. no. 277. 562.
[13] S. Wang, J. Tang, C. Aggarwal, Y. Chang, and H. Liu, “Signed net- [37] M. E. Newman, “Finding community structure in networks using
work embedding in social media,” in Proc. SIAM Int. Conf. Data the eigenvectors of matrices,” Phys. Rev. E, vol. 74, no. 3, 2006,
Mining, 2017, pp. 327–335. Art. no. 036104.
[14] C. Tu, W. Zhang, Z. Liu, and M. Sun, “Max-margin deepwalk: [38] O. Levy and Y. Goldberg, “Neural word embedding as implicit
Discriminative learning of network representation,” in Proc. 25th matrix factorization,” in Proc. Adv. Neural Inf. Process. Syst., 2014,
Int. Joint Conf. Artif. Intell., 2016, pp. 3889–3895. pp. 2177–2185.
[15] C. Yang, Z. Liu, D. Zhao, M. Sun, and E. Y. Chang, “Network [39] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol,
representation learning with rich text information,” in Proc. 24th “Stacked denoising autoencoders: Learning useful representations
Int. Joint Conf. Artif. Intell., 2015, pp. 2111–2117. in a deep network with a local denoising criterion,” J. Mach. Learn.
[16] S. Chang, W. Han, J. Tang, G.-J. Qi, C. C. Aggarwal, and Res., vol. 11, no. Dec, pp. 3371–3408, 2010.
T. S. Huang, “Heterogeneous network embedding via deep [40] S. Chen, S. Niu, L. Akoglu, J. Kovacevic, and C. Faloutsos, “Fast,
architectures,” in Proc. 21th ACM SIGKDD Int. Conf. Knowl Discov- warped graph embedding: Unifying framework and one-click
ery Data Mining, 2015, pp. 119–128. algorithm,” arXiv preprint arXiv:1702.05764, 2017.
[17] N. Natarajan and I. S. Dhillon, “Inductive matrix completion for [41] P. Yanardag and S. Vishwanathan, “Deep graph kernels,” in Proc.
predicting gene–disease associations,” Bioinf., vol. 30, no. 12, 21th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2015,
pp. i60–i68, 2014. pp. 1365–1374.
[18] C. Li, J. Ma, X. Guo, and Q. Mei, “Deepcas: An end-to-end predic- [42] M. Niepert, M. Ahmed, and K. Kutzkov, “Learning convolutional
tor of information cascades,” in Proc. 26th Int. Conf. World Wide neural networks for graphs,” in Proc. Int. Conf. Mach. Learn., 2016,
Web, 2017, pp. 577–586. pp. 2014–2023.
[19] S. Yeung, O. Russakovsky, G. Mori, and L. Fei-Fei, “End-to-end [43] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature,
learning of action detection from frame glimpses in videos,” in vol. 521, no. 7553, 2015, Art. no. 436.
Proc. IEEE Conf. Comput. Vision Pattern Recognit., 2016, pp. 2678– [44] M. Ou, P. Cui, F. Wang, J. Wang, and W. Zhu, “Non-transitive
2687. hashing with latent similarity components,” in Proc. 21th ACM
[20] X. Yang, Y.-N. Chen, D. Hakkani-T€ ur, P. Crook, X. Li, J. Gao, and SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2015, pp. 895–
L. Deng, “End-to-end joint learning of natural language under- 904.
standing and dialogue manager,” in Proc. IEEE Int. Conf. Acoust. [45] L. Katz, “A new status index derived from sociometric analysis,”
Speech Signal Process., 2017, pp. 5690–5694. Psychometrika, vol. 18, no. 1, pp. 39–43, 1953.
[21] R. Hu, C. C. Aggarwal, S. Ma, and J. Huai, “An embedding [46] L. A. Adamic and E. Adar, “Friends and neighbors on the web,”
approach to anomaly detection,” in Proc. IEEE 32nd Int. Conf. Data Soc. Netw., vol. 25, no. 3, pp. 211–230, 2003.
Eng., 2016, pp. 385–396. [47] C. C. Paige and M. A. Saunders, “Towards a generalized singular
[22] T. Man, H. Shen, S. Liu, X. Jin, and X. Cheng, “Predict anchor links value decomposition,” SIAM J. Numerical Anal., vol. 18, no. 3,
across social networks via an embedding approach,” in Proc. 25th pp. 398–405, 1981.
Int. Joint Conf. Artif. Intell., 2016, pp. 1823–1829. [48] M. Cygan, M. Pilipczuk, M. Pilipczuk, and J. O. Wojtaszczyk,
[23] T. Chen and Y. Sun, “Task-guided and path-augmented het- “Sitting closer to friends than enemies, revisited,” Theory Comput.
erogeneous network embedding for author identification,” in Syst., vol. 56, no. 2, pp. 394–405, 2015.
Proc. 10th ACM Int. Conf. Web Search Data Mining, 2017, [49] M. A. Hearst, S. T. Dumais, E. Osuna, J. Platt, and B. Scholkopf,
pp. 295–304. “Support vector machines,” IEEE Intell. Syst. Appl., vol. 13, no. 4,
[24] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, pp. 18–28, July-Aug. 1988.
“Distributed representations of words and phrases and their [50] T. M. Le and H. W. Lauw, “Probabilistic latent document network
compositionality,” in Proc. Adv. Neural Inf. Process. Syst., 2013, embedding,” in Proc. IEEE Int. Conf. Data Mining, 2014, pp. 270–
pp. 3111–3119. 279.
[25] A. Grover and J. Leskovec, “node2vec: Scalable feature learning [51] J. Chang and D. M. Blei, “Relational topic models for document
for networks,” in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Dis- networks,” in Proc. Int. Conf. Artif. Intell. Statist., 2009, pp. 81–88.
covery Data Mining, 2016, pp. 1225–1234. [52] X. Sun, J. Guo, X. Ding, and T. Liu, “A general framework
[26] S. Cao, W. Lu, and Q. Xu, “Deep neural networks for learning for content-enhanced network representation learning,”
graph representations,” in Proc. 30th AAAI Conf. Artif. Intell., 2016, arXiv:1610.02906, 2016.
pp. 1145–1152. [53] S. Pan, J. Wu, X. Zhu, C. Zhang, and Y. Wang, “Tri-party deep net-
[27] S. Yan, D. Xu, B. Zhang, and H.-J. Zhang, “Graph embedding: A work representation,” Netw., vol. 11, no. 9, 2016, Art. no. 12.
general framework for dimensionality reduction,” in Proc. IEEE [54] X. Huang, J. Li, and X. Hu, “Label informed attributed network
Comput. Soc. Conf. Comput. Vision Pattern Recognit., 2005, vol. 2, embedding,” in Proc. 10th ACM Int. Conf. Web Search Data Mining,
pp. 830–837. 2017, pp. 731–739.

Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
CUI ET AL.: A SURVEY ON NETWORK EMBEDDING 851

[55] F. R. K. Chung and F. C. Graham, “Spectral graph theory,” Ameri- [79] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin,
can Mathematical Soc., no. 92, 1997. “Liblinear: A library for large linear classification,” J. Mach. Learn.
[56] X. Huang, J. Li, and X. Hu, “Accelerated attributed network Res., vol. 9, no. Aug, pp. 1871–1874, 2008.
embedding,” in Proc. SIAM Int. Conf. Data Mining, 2017, pp. 633– [80] T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng, “Nus-
641. wide: A real-world web image database from national university
[57] X. Wei, L. Xu, B. Cao, and P. S. Yu, “Cross view link prediction by of singapore,” in Proc. ACM Int. Conf. Image Video Retrieval, 2009,
learning noise-resilient representation consensus,” in Proc. 26th Art. no. 48.
Int. Conf. World Wide Web, 2017, pp. 1611–1619. [81] M. Belkin and P. Niyogi, “Laplacian eigenmaps for dimensional-
[58] Y. Jacob, L. Denoyer, and P. Gallinari, “Learning latent representa- ity reduction and data representation,” Neural Comput., vol. 15,
tions of nodes for classifying in heterogeneous social networks,” no. 6, pp. 1373–1396, 2003.
in Proc. 7th ACM Int. Conf. Web Search Data Mining, 2014, pp. 373– [82] L. L€ u and T. Zhou, “Link prediction in complex networks: A
382. survey,” Physica A: Statist. Mech. Appl., vol. 390, no. 6, pp. 1150–
[59] Z. Huang and N. Mamoulis, “Heterogeneous information net- 1170, 2011.
work embedding for meta path based proximity,” arXiv: [83] L. Getoor and C. P. Diehl, “Link mining: a survey,” ACM SIGKDD
1701.05291, 2017. Explorations Newsletter, vol. 7, no. 2, pp. 3–12, 2005.
[60] Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu, “Pathsim: Meta path- [84] J. MacQueen, et al., “Some methods for classification and analysis
based top-k similarity search in heterogeneous information of multivariate observations,” in Proc.e 5th Berkeley Symp. Math.
networks,” Proc. VLDB Endowment, vol. 4, no. 11, pp. 992–1003, Statist. Probability, 1967, pp. 281–297.
2011. [85] D. Cai, X. He, J. Han, and T. S. Huang, “Graph regularized non-
[61] L. Xu, X. Wei, J. Cao, and P. S. Yu, “Embedding of embedding negative matrix factorization for data representation,” IEEE Trans.
(eoe): Joint embedding for coupled heterogeneous networks,” in Pattern Anal. Mach. Intell., vol. 33, no. 8, pp. 1548–1560, Aug. 2011.
Proc. 10th ACM Int. Conf. Web Search Data Mining, 2017, pp. 741– [86] A. L. Traud, P. J. Mucha, and M. A. Porter, “Social structure of
749. facebook networks,” Physica A: Statist. Mech. Appl., vol. 391,
[62] A. Guille, H. Hacid, C. Favre, and D. A. Zighed, “Information dif- no. 16, pp. 4165–4180, 2012.
fusion in online social networks: A survey,” ACM SIGMOD [87] F. Tian, B. Gao, Q. Cui, E. Chen, and T.-Y. Liu, “Learning deep
Record, vol. 42, no. 2, pp. 17–28, 2013. representations for graph clustering” in Proc. AAAI, 2014,
[63] S. Bourigault, C. Lagnier, S. Lamprier, L. Denoyer, and pp. 1293–1299.
P. Gallinari, “Learning social network embeddings for predicting [88] Z. Huang, Y. Zheng, R. Cheng, Y. Sun, N. Mamoulis, and X. Li,
information diffusion,” in Proc. 7th ACM Int. Conf. Web search Data “Meta structure: Computing relevance in large heterogeneous
Mining, 2014, pp. 393–402. information networks,” in Proc. 22nd ACM SIGKDD Int. Conf.
[64] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Knowl. Discovery Data Mining, 2016, pp. 1595–1604.
Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997. [89] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor,
[65] T. Mikolov, M. Karafi at, L. Burget, J. Cernock
y, and S. Khudanpur, “Freebase: A collaboratively created graph database for structur-
“Recurrent neural network based language model,” Interspeech, ing human knowledge,” in Proc. ACM SIGMOD Int. Conf. Manage.
vol. 2, 2010, Art. no. 3. Data, 2008, pp. 1247–1250.
[66] D. W. Ruck, S. K. Rogers, M. Kabrisky, M. E. Oxley, and [90] M. Laurens van der and G. Hinton, “Visualizing data using t-sne,”
B. W. Suter, “The multilayer perceptron as an approximation to a J. Mach. Learn. Res., vol. 9, pp. 2579–2605, 2008.
bayes optimal discriminant function,” IEEE Trans. Neural Netw., [91] K. W. Lim and W. Buntine, “Bibliographic analysis with the cita-
vol. 1, no. 4, pp. 296–298, Dec. 1990. tion network topic model,” arXiv:1609.06826, 2016.
[67] L. Akoglu, H. Tong, and D. Koutra, “Graph based anomaly detec- [92] A. R. Benson, D. F. Gleich, and J. Leskovec, “Higher-order organi-
tion and description: A survey,” Data Mining Knowl. Discovery, zation of complex networks,” Sci., vol. 353, no. 6295, pp. 163–166,
vol. 29, no. 3, pp. 626–688, 2015. 2016.
[68] R. S. Burt, “Structural holes and good ideas,” Amer. J. Sociology, [93] E. Seo, P. Mohapatra, and T. Abdelzaher, “Identifying rumors and
vol. 110, no. 2, pp. 349–399, 2004. their sources in social networks,” SPIE Defense Secur. Sens.,
[69] M. Bayati, M. Gerritsen, D. F. Gleich, A. Saberi, and Y. Wang, pp. 83 891I–83 891I, 2012.
“Algorithms for large, sparse network alignment problems,” in [94] Q. Zhang, S. Zhang, J. Dong, J. Xiong, and X. Cheng, “Automatic
Proc. 9th IEEE Int. Conf. Data Mining, 2009, pp. 705–710. detection of rumor on social network,” in Proc. Natural Lang. Pro-
[70] L. Tang and H. Liu, “Relational learning via latent social cess. Chinese Comput., 2015, pp. 113–122.
dimensions,” in Proc. 15th ACM SIGKDD Int. Conf. Knowl. Discov- [95] J. Tang, T. Lou, and J. Kleinberg, “Inferring social ties across heter-
ery Data Mining, 2009, pp. 817–826. ogenous networks,” in Proc. 5th ACM Int. Conf. Web Search Data
[71] L. tang and H. Liu, “Scalable learning of collective behavior based Mining, 2012, pp. 743–752.
on sparse social dimensions,” in Proc. 18th ACM Conf. Inf. Knowl. [96] D. Krioukov, F. Papadopoulos, M. Kitsak, A. Vahdat, and
Manage., 2009, pp. 1107–1116. M. Bogun a, “Hyperbolic geometry of complex networks,” Phys.
[72] M. De Choudhury, Y.-R. Lin, H. Sundaram, K. S. Candan, L. Xie, Rev. E, vol. 82, no. 3, 2010, Art. no. 036106.
A. Kelliher et al., “How does the data sampling strategy impact
the discovery of information diffusion in social media?” ICWSM, Peng Cui received the PhD degree from
vol. 10, pp. 34–41, 2010. Tsinghua University, in 2010. He is an associate
[73] J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su, “Arnetminer: professor with tenure at Tsinghua University. His
extraction and mining of academic social networks,” in Proc. 14th research interests include network representation
ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2008, learning, human behavioral modeling, and social-
pp. 990–998. sensed multimedia computing. He has published
[74] A. K. McCallum, K. Nigam, J. Rennie, and K. Seymore, more than 60 papers in prestigious conferences
“Automating the construction of internet portals with machine and journals in data mining and multimedia. His
learning,” Inf. Retrieval, vol. 3, no. 2, pp. 127–163, 2000. recent research received the SIGKDD 2016 Best
[75] J. Leskovec, J. Kleinberg, and C. Faloutsos, “Graph evolution: Paper Finalist, ICDM 2015 Best Student Paper
Densification and shrinking diameters,” ACM Trans. Knowl. Dis- Award, SIGKDD 2014 Best Paper Finalist, IEEE
covery Data, vol. 1, no. 1, 2007, Art. no. 2. ICME 2014 Best Paper Award, ACM MM12 Grand Challenge Multimodal
[76] J. Leskovec and A. Krevl, “Snap datasets: Stanford large network Award, and MMM13 Best Paper Award. He is an associate editor of
dataset collection (2014),” 2016. [Online]. Available: http://snap. IEEE Transactions on Knowledge and Data Engineering, the IEEE
stanford.edu/data Transactions on Big Data, the ACM Transactions on Multimedia Com-
[77] M. Mahoney, “Large text compression benchmark,” 2011, http:// puting, Communications, and Applications, the Elsevier Journal on Neu-
www. mattmahoney.net/text/text.html rocomputing, etc. He was the recipient of ACM China Rising Star Award
[78] B.-J. Breitkreutz, C. Stark, T. Reguly, L. Boucher, A. Breitkreutz, in 2015.
M. Livstone, R. Oughtred, D. H. Lackner, J. B€ahler, V. Wood,
et al., “The biogrid interaction database: 2008 update,” Nucleic
Acids Res., vol. 36, no. suppl_1, pp. D637–D640, 2007.

Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.
852 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 31, NO. 5, MAY 2019

Xiao Wang received the PhD degree from the Wenwu Zhu received the PhD degree from New
School of Computer Science and Technology, York University, New York, NY, in 1996. He is
Tianjin University, Tianjin, China, in 2016. He is a currently a professor and the vice chair of the
postdoctoral researcher with the Department of Department of Computer Science, Tsinghua
Computer Science and Technology, Tsinghua University, Beijing, China. Prior to his current
University, Beijing, China. He received the China position, he was a senior researcher and a
Scholarship Council Fellowship, in 2014 and vis- research manager with Microsoft Research Asia,
ited Washington University in St. Louis, as a joint Beijing, China. His current research interests
training student from Nov. 2014 to Nov. 2015. His include the areas of multimedia computing,
current research interests include data mining, communications, and networking, as well as big
social network analysis, and machine learning. data. He has been serving as the editor-in-chief
Until now, he has published more than 20 papers in conferences such for the IEEE Transactions on Multimedia (T-MM) since January 1, 2017.
as AAAI, IJCAI, etc., and journals such as the IEEE Transactions on He was the recipient of five Best Paper Awards including T-CSVT in
Cybernetics, etc. Now his research is sponsored by the National Science 2001 and ACM Multimedia 2012. He is a fellow of the IEEE, AAAS,
Foundation of China. SPIE, and an ACM distinguished scientist.

Jian Pei is currently a vice president of JD.com " For more information on this or any other computing topic,
and a professor with Simon Fraser University. please visit our Digital Library at www.computer.org/publications/dlib.
He is in charge of the big data and smart supply
chain business at JD.com. Recognized as an
ACM fellow and an IEEE fellow, he is responsible
for a series of inventions in data mining and data-
base systems. He has more than 200 technical
publications, which have been cited extensively.
His research has generated remarkable impact
substantially beyond academia. He received
several prestigious awards, including the ACM
SIGKDD Innovation Award, ACM SIGKDD Service Award, IEEE ICDM
Research Contribution Award, and IEEE ICDE Most Influential Paper
Award. He is a fellow of the IEEE.

Authorized licensed use limited to: INSTITUTE OF COMPUTING TECHNOLOGY CAS. Downloaded on March 18,2022 at 04:23:31 UTC from IEEE Xplore. Restrictions apply.

You might also like