Knowledge Graph Recommender
Knowledge Graph Recommender
System: A Survey
YANG GAO* and YI-FAN LI* , University of Texas at Dallas
arXiv:2004.00387v1 [cs.IR] 25 Mar 2020
1 INTRODUCTION
In modern life, recommender systems (RS) are critical for users to make proper choices from the
huge amount of products or services. For example, people are lured in AI-driven recommendation
service for more than 75% of the time they spent on watching YouTube videos [22]. These systems
attempt to learn the users’ interests and keep them away from over-choice, which significantly boosts
their decision making process [11]. They also help to promote services and sales for business such as
e-commerce.
There are two main architectures for traditional recommender systems: content-based system and
collaborative-filtering based system. Content-based systems accept data that can be represented as
vectors in an euclidean space [38] and measure their similarities based on these vectors. Collaborative-
filtering based systems usually assume that each user-item interaction is an independent instance
with side information encoded [32]. However, in modern society, there is an increasing number of
applications where data are generated from non-Euclidean domains and are represented in the form
* Both authors contribute equally to this research.
Authors’ addresses: Yang Gao, yxg122530@utdallas.edu; Yi-Fan Li, yli@utdallas.edu, University of Texas at Dallas;
Yu Lin, University of Texas at Dallas, yxl163430@utdallas.edu; Hang Gao, University of Maryland Baltimore County,
hanggao1@umbc.edu; Latifur Khan, University of Texas at Dallas, lkhan@utdallas.edu.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full
citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting
with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee. Request permissions from permissions@acm.org.
© 2020 Association for Computing Machinery.
XXXX-XXXX/2020/4-ART $15.00
https://doi.org/10.1145/nnnnnnn.nnnnnnn
Fig. 1. Illustration of user-item interactions and knowledge graph on YouTube, which contains users,
movies, directors and genres as objects; liking, categorizing, directing and writing as object relations
of Knowledge Graphs (KG). This data structure breaks the independent interaction assumption by
linking items with their attributes. For example, in YouTube RS (show in Fig. 1), the connections
between users and movies might be actions such as views, likes and comments. On the other hand,
movies may share the same genre, director, and etc. With these information, a knowledge-aware
RS is hence able to capture not only the user-item interactions but also the rich item-item/user-user
relations to make more accurate recommendations. It is worth noticing that nodes in the KG (e.g.,
person or movies) may have distinct neighborhood size (i.e., number of neighboring nodes), and the
relationships between them could vary as well, which makes recommendation with KG even more
challenging.
Recently, there is an increasing interest in extending deep learning approaches for graph data.
Motivated by CNNs, RNNs, and autoencoders from deep learning, new neural network architectures
have been rapidly developed over the past few years to handle the complexity of graph data. These
types of networks are referred as Graph Neural Networks (GNNs) [38] and are critical for recent
success of knowledge-aware deep recommender systems.
There are a number of existing reviews on the topic of graph neural networks and knowledge
graph analysis. Wu et al. [38] give an overview of graph neural networks in data mining and
machine learning fields. Although it is the first comprehensive review of GNNs, this survey mainly
focuses on the network architectures and only briefly mentions the possibility of applying GNNs for
recommendation. On the other hand, Shi et al. [20] summarize the recommender systems utilizing
the traditional knowledge graph analysis methods. But it misses the most recent development of
GNN-based algorithms. To our best knowledge, our survey is the first review of state-of-the-art
deep recommender systems that utilize GNNs to extract information from knowledge graphs for
improving recommendation.
Our Contributions: Our paper makes notable contributions summarized as follows:
• New Taxonomy: The core component of knowledge-aware deep recommender systems is the
graph embedding module, which is usually a GNN in the state-of-the-art frameworks. Each
layer of a GNN consists of two basic components: Aggregator and Updater. We categorize the
aggregators into three groups: relation-unaware aggregator, relation-aware subgraph aggregator,
and relation-aware attentive aggregator. We also divide the updaters into three categories, i.e.,
context-only updater, single-interaction updater and multi-interaction updater.
Users Items
Tom Jack
In
Query In
Properties Shop Properties Category
Words Has
Prefers Prefers
Category Co-occurrence
Mariam Bob
Fig. 3. The general workflow of a GNN-based Knowledge Aware Deep Recommender (GNN-KADR)
system.
set A: ϕ(v) ∈ A, and each edge e ∈ E belongs to a particular relation type in relation type set R:
ψ (e) ∈ R. Mining knowledge graphs is usually based on a basic entity-relation-entity triplet (u, e, v),
where u ∈ V, e ∈ E, v ∈ V denote the head, relation and tail of this triplet. In this paper, we refer
these entity-relation-entity triplets as knowledge triplets for simplicity. Here the types of nodes u and
v could be either same or different depending on the context.
Definition 2.2. Neighborhood N (v). Given a knowledge graph G = (V, E), for a node v, its neigh-
borhood N (v) is defined as the set of nodes that directly connect to v, i.e., {w |(w, e, u) or (u, e, w), e ∈
E}.
Definition 2.3. r -Neighborhood Nr (v). Given a knowledge graph G = (V, E), for a node v, its
r -neighborhood Nr (v) is defined as the set of nodes that connect to v with edges of type r , i.e.,
{w |(w, e, u) or (u, e, w), where e ∈ E and ψ (e) = r }.
User-Item Recommendation In general, V can be further written as V = V1 ∪ V2 ∪ · · · ∪ Vi ∪ · · · ∪
Vn , where Vi denotes the set of nodes with type i and n = |A|. In recommendation we denote V1 as
the set of user nodes and V2 as the set of item nodes, with V3 , . . . , Vn representing the other objects’
nodes (brand, properties, etc). We also denote E = El abel ∪ Eunl abel , where El abel ⊆ V1 × V2
represents the set of edges between user and item nodes, and Eunl abel = E \ El abel represents other
edges. Since a typical recommendation setting in real world is to predict a user’s preferred items
based on his historical behaviors, we use G = (V, E) to denote the graph constructed from historical
data, and G P = (V P , E P ) to denote the graph of real future. The user-item recommendation problem
can hence be formulated as a link prediction problem on graph:
Input: A KG G = (V, E) constructed based on historical data.
Output: A predicted edge set E bp , which is the prediction of the real edge set E P on G P .
l abel l abel
Throughout this paper, the bold uppercase characters are used to denote matrices and bold
lowercase characters are used to denote vectors. Unless particularly specified, the notations used in
this paper are illustrated in Table 1.
Notations Descriptions
|·| The size of a set.
⊙ Element-wise product.
|| Concatenation
G A knowledge graph.
V The set of nodes in a knowledge graph.
E The set of edges between nodes in V.
A The node type set.
R The edge relation type set.
v A node v ∈ V.
ei, j An edge ei, j ∈ E.
(u, e, v) A knowledge triplet where u ∈ V, e = eu,v ∈ E, v ∈ V denote the head, relation and tail of this triplet.
zv ∈ Rd The feature vector of node v
ze ∈ Rc The feature vector of edge e = eu,v or ev,u
zei, j ∈ Rc The feature vector of edge ei, j
nu ∈ Rk The context representation of node u
nur ∈ Rk Type-r context representation of node u
X∈ R N ×d The feature matrix of nodes in a knowledge graph.
A ∈ R N ×N The adjacency matrix of a knowledge graph
N ∈ R N ×k The context representations of all nodes in a knowledge graph
N (v) The neighborhood of a node v
Nr (v) r -Neighborhood of a node v.
n The number of nodes, n = |V |.
m The number of edges, m = |E |.
d The dimension of a node feature vector.
c The dimension of an edge feature vector
k The dimension of v’s context representation.
σ (·) The sigmoid activation function
β(·) The LeakyReLU activation function
γ (·) An activation function, e.g., sigmoid, ReLU, etc.
W, Θ, b Learnable model parameters.
match score between this user and each of the candidate items according to their corresponding
embeddings. Those items with top-N match scores (or match scores above a user-specified threshold)
are linked (recommended) to this user.
In this process, the core component of the system is the graph embedding module, which is usually
a Graph Neural Network (GNN) in the state-of-the-art frameworks. Each layer of a GNN consists of
two basic components: Aggregator and Updater. For a node v, the Aggregator aggregates the feature
information from the neighbors of v to produce a context representation. The Updater then utilizes
this context representation as well as other input information to obtain a new embedding for node v.
Stacking K different GNN layers or reusing the same GNN layer K times expands GNN’s receptive
field to K-hop graph neighborhoods.
We categorize the aggregators of GNNs into relation-unaware aggregator, relation-aware subgraph
aggregator and relation-aware attentive aggregator. On the other hand, we divide the updaters into
Table 2. Taxonomy and representative publications of GNN-based Knowledge Aware Deep Recom-
mender (GNN-KADR) systems.
∗ = (W z )T tanh(W z + z )
αu,v e v e u e
zunew = β W1 · (zu + nu ) + β W2 · (zu ⊙ nu )
∗ )
exp(αu,v
KGAT [33] αu,v = Í ′ ∗
—
exp(αu,v ′)
Í v ∈N (u)
nu = v ∈N (u) αu,v zv
LeakyRe LU (w (W z ⊙(W z | |W z )))
u e e p u p v
αu,v = Í
DANSER [36] Í v ∈N (u) LeakyRe LU (wu (We ze ⊙(Wp zu | |Wp zv ))) zunew = γ (W · nu + b) Dynamicity
nu = v ∈N (u) αu,v zv
r =Ísoftmaxv {Wr1 zv · Wr2 zu |v ∈ Nr (u)}
αv,u
nur = ∀v ∈Nr (u) αv,u
r z zunew = ReLU W · (zu ||nu ) + b
RecoGCN [41] v Scalability
nu = r nur
Í
nu = Avд/Pool {ReLU (W1 zv + b)|v ∈ N (u)}, {αv } zunew = ReLU W2 · (zu ||nu ) + b
PinSage [42] Scalability
nur = 1
Wr zv
Í
√
zunew = γ W · γ (nu )
STAR-GCN [45] v ∈N r (u) |N r (u) | · |N r (v) | Cold-Start
Í r
nu = r nu
nur = Avд {zv |v ∈ Nr (u)}
IntentGC [46] zunew = γ (W · zu + nu ) Scalability
nu = r| R=1|−2 Wr nur
Í
three categories, i.e., context-only updater, single-interaction updater and multi-interaction updater.
Various categories of aggregators and updaters are described in Figure 4 and Figure 5 respectively. In
the following, we discuss them in details.
3.1 Aggregator
3.1.1 Relation-unaware Aggregator. For a target node u, a relation-unaware aggregator aims to
aggregate information from parts or all of u’s neighboring nodes to produce a context representation.
However, in this process, the relation r = ψ (e) (e = eu,v or ev,u ) between the target node u and any
neighboring node v is ignored, and its information is hence not encoded in the context representation
nu .
Fan et al. [4] points out that existing methods used in industry for intent recommendation rely
on extensive laboring feature engineering and fail to utilize the rich interaction between users and
items, which limits the model performance. To solve these issues, they model the complex objects
(i.e., users and items with their attributes) as well as their interactions as a knowledge graph, and
present a framework called MEIRec to learn the object embeddings for recommendation. The GNN
in MEIRec generates a context representation for a target node u by
nu = д({zv |v ∈ N (u)}) (1)
where zv is the embedding of node v and д is a aggregate function that can be either average, LSTM
or CNN depending on the context. For example, they choose д to be average function if u is an item
node, since there is usually no priority among users that have clicked/purchased an item. On the
other hand, they choose LSTM as д if u represents a user node, because a user usually clicks items
with timestamp and its neighbors can be viewed as a sequence data.
Ying et.al. [42] introduce a data-efficient large-scale deep recommendation engine PinSage that is
developed and deployed at Pinterest. PinSage combines efficient random walks and GNN to generate
embeddings of nodes that incorporate both graph structure as well as node feature information. For
each target node u, PinSage first measures the importance of u’s neighboring nodes by simulating
random walks starting from u and computes the L 1 -normalized visit count of nodes. Then the context
representation nu is computed by
nu = Avд/Pool({ReLU (W1 zv + b)|v ∈ N (u)}, {αv }) (2)
where W1 , b are parameters to learn, and {αv } are L 1 -normalized visit counts of u’s corresponding
neighboring nodes.
3.1.2 Relation-aware Subgraph Aggregator. To handle different relations in the knowledge
graph, the relation-aware subgraph aggregators split the neighborhood graph into multiple subgraphs
such that all edges of a subgraph belong to only one of the relation types in R. Each subgraph is
assigned with an aggregator characterized by a unique set of parameters, and its information is
extracted by this aggregator to produce a relation-sensitive context representation. We use nur to
denote the representation generated from a subgraph with type r edges. Finally, {nur 1 , nur 2 , . . .} are
further fused together to produce the overall context representation for target node u. A typical
example is shown in Figure 4b.
r
= softmaxv {Wr1 zv · Wr2 zu |v ∈ Nr (u)}
αv,u (3)
nur = r
Õ
αv,u zv (4)
∀v ∈N r (u)
nur
Õ
nu = (5)
r
where Wr1 and Wr2 are two learnable transformations assigned to the relation type r ∈ R.
To boost the performance in recommender systems, The STAR-GCN architecture introduced by
Zhang et al. [45] employs a multi-link graph convolutional encoder to learn the node representations.
Each representation type r ∈ R is assigned with a specific transformation. Its aggregator is hence
formulated as:
1
nur = Wr zv
Õ
p (6)
v ∈N r (u) |N r (u)| · |N r (v)|
nur
Õ
nu = (7)
r
where {Wr |r ∈ R} are the parameters to learn.
IntentGC proposed by Zhao et al. [46] is a recommendation framework that captures both the
explicit user preference and heterogeneous relationships of auxiliary information. To perform large-
scale recommendation, IntentGC introduces a vector-wise aggregator:
nur = Avд {zv |v ∈ Nr (u)}
(8)
|Õ
R |−2
nu = Wr nur (9)
r =1
This aggregator is computationally efficient since it avoids unnecessary computations by replacing
the expensive operation W · (nur 1 ||nur 2 || . . .) with the summation of multiple small matrix products.
3.1.3 Relation-aware Attentive Aggregator. In contrast to the other two types of aggregators,
relation-aware attentive aggregators convert the neighborhood graph into a weighted graph, where the
f (zu ,ze ,zv )
weight of each edge is a function of corresponding knowledge triplet, e.g., w = Í ′ ,
v ∈N (u) f (zu ,ze ′ ,zv ′ )
w = f (zu , ze , zv ), etc. These weights capture the rich semantic information encoded in the edges of a
knowledge graph, and measure the relatedness of different knowledge triplets to the target node u.
Wang et al. [30] propose an end-to-end framework KGNN-LS that utilizes a knowledge graph as
an addition information source to improve recommendation. For a given user, KGNN-LS applies a
trainable function to identify important knowledge graph relationships and converts the knowledge
graph into a user-specific weighted graph. The aggregator in KGNN-LS is hence formulated as:
A′ = A + I (10)
− 12 − 21
N = D AD X (11)
= f (zei, j ) and D is a diagonal matrix with Di,i = j Ai, j . f is a user-specific scoring
Í ′
where Ai, j
function.
Li et al. [15] introduce a novel framework named V2HT, which is capable of boosting the
performance of micro-video hashtag recommendation by jointly considering the sequential feature
learning, the video-user-hashtag interaction, and the hashtag correlations. Similar to KGNN-LS, they
assign different weights on different types of edges to transform the knowledge graph into a weighted
graph. Specifically, the aggregator used in V2HT is:
Ai, j = fweiдht (X[i, :], X[j, :], ei, j ) (12)
A
(
α · Í ′ i,Aji, j ′ if i , j
Ai, j =
′ j (13)
1−α otherwise
1 1
N = D̃′− 2 A′D̃′− 2 X (14)
where the subscripts i and j denote the i t h and jt h node in the graph, and D̃ is a diagonal matrix with
D̃i,i = j Ai,′ j . Here α determines the weights assigned to a node itself and other correlated nodes.
Í
When α → 1, the feature of a node itself will be ignored; when α → 0, neighboring information will
be ignored.
Fan et al. [6] propose a novel graph neural network framework GraphRec for social recommenda-
tion. The knowledge graph in social recommendation usually consists of two graphs: a social graph
denoting the relationship between users and a user-item graph denoting interactions between users
and items. GraphRec introduces three aggregators to process these two different graphs, i.e., user
aggregator, item aggregator and social aggregator. The user and item aggregators extract information
from the user-item graph while the social aggregator distills information from the social graph. These
aggregators work in a similar way. Thus we only explain the item aggregator here for simplicity.
Mathematically, the item aggregator is formulated as:
xu,v = MLP(zv ||ze ) (15)
∗
αu,v = WT2 · γ (W1 · (xu,v ||zu ) + b1 ) + b2 (16)
∗ )
exp(αu,v
αu,v = Í ∗ (17)
v ′ ∈N (u) exp(αu,v )
Õ
nu = αu,v xu,v (18)
v ∈N (u)
where αu,v represents the attention weight of the interaction with v in contributing u’s context
representation.
Recommendation in online communities faces two challenges: 1) users’ interest are dynamic, and
2) users are influenced by their friends. To address these challenges, Song et al. [23] introduce a
novel framework DGRec based on a dynamic graph attention neural network. Specifically, They
first model dynamic user behaviors with a recurrent neural network, and its outputs are used as
initial graph node embeddings. Then, a graph attention neural network is adopted to model context
dependent social influence, which dynamically infers the influencers based on user’s current interest.
The aggregator in this graph attention neural network is:
exp(zu · zv )
αu,v = Í (19)
v ∈N (v) exp(zu · zv )
Õ
nu = αu,v zv (20)
v ∈N (u)
The KGCN framework proposed by Wang et al. [31] is an end-to-end recommender system
that automatically discovers the high-order structure information and semantic information of
Knowledge Graph (KG). The key idea of KGCN is to aggregate and incorporate neighborhood
information with bias when calculating the representation for a given node in the knowledge graph.
It is implemented by weighting neighbors with scores dependent on the connecting relations and
nodes, which characterizes both the semantic information of KG and user’s personalized interests in
relations. Specifically, the aggregator in KGCN is:
∗
αu,v = д(zv , ze ) (21)
∗ )
exp(αu,v
αu,v = Í ∗ (22)
v ′ ∈N (u) exp(αu,v ′ )
Õ
nu = αu,v zv (23)
v ∈N (u)
Wang et al. [33] propose a method named KGAT to explicitly model the high-order connectivities
in KG in an end-to-end fashion. The aggregator in KGAT works in a similar way as that of KGCN.
The only difference is that it takes all the three embeddings zu , ze , and zv of a knowledge triplet as
input to compute the attention score:
∗
αu,v = (We zv )T tanh(We zu + ze ) (24)
∗ )
exp(αu,v
αu,v = Í ∗ (25)
v ′ ∈N (u) exp(αu,v ′ )
Õ
nu = αu,v zv (26)
v ∈N (u)
Here We is a projection matrix to learn, which transforms inputs from the node embedding space to
the edge embedding space.
In social recommendation, there are four different social effects in recommender systems, including
two-fold effects in user domain, i.e., a static effect from social homophily and a dynamic effect
from social influence, and the other symmetric two-fold ones in item domain. To capture these four
effects, Wu et al. [36] propose a framework named DANSER, which is composed of two dual Graph
ATtention networks (GAT) [27]: one dual GAT for users-including a GAT to capture social homophily
and a GAT to capture social influenceâĂŤand the other dual GAT for items. The aggregators in
DENSER work in a similar way. We show one of them as an example:
LeakyReLU (wu (We ze ⊙ (Wp zu ||Wp zv )))
αu,v = Í (27)
v ∈N (u) LeakyReLU (wu (We ze ⊙ (Wp zu ||Wp zv )))
Õ
nu = αu,v zv (28)
v ∈N (u)
where Wp , We , and wu are two weight matrices and one weight vector to learn.
3.1.4 Discussion: Compared to the other two types of aggregators, the relation-unaware aggrega-
tor loses the rich semantic information encoded in the edges of a Knowledge Graph (KG), which
limits the model performance. On the other hand, although the relation-aware subgraph aggregator is
able to model various semantic relations in KG, it has several limitations. First, as each relation type
r ∈ R is assigned with a unique set of parameters, the number of training parameters in this type of
aggregators significantly increases when the relation type set R becomes larger. Those methods utiliz-
ing relation-aware subgraph aggregators are hence not suitable for processing knowledge graphs with
numerous types of relations. Second, these aggregators process the subgraphs of different relation
types independently. The correlation between different relation types are hence not encoded in the
learned context representation nu , which limits the recommendation performance. The relation-aware
attentive aggregator utilizes a single function to compute attention scores for different relation types
in KG, and hence can be applied for large-scale knowledge graphs. Meanwhile, it also improves
the explainability of deep recommender systems by weighting the contributions of different nodes
neighboring to the target node u.
3.2 Updater
3.2.1 Context-only Updater. For any node u in the knowledge graph, the context-only updater
only receives u’s context representation nu as input, and produces a new representation zunew = f (nu )
for this node. Here f is the updater function.
MEIRec [4] simply takes nu as the new embedding for node u, i.e., zunew = nu , which is com-
putational efficient but may not be optimal. To overcome this issue, GraphRec [6], V2HT [15],
DGRec [23], KGCN [31], and DANSER [36] propose to approximate f via a MLP:
zunew = γ (W · nu + b) (29)
where γ is a non-linear activation function such as ReLU, and the bias b could be set to 0. STAR-
GCN [45] improves the non-linearity of f by applying another activation function on nu before
sending it to the MLP layer:
zunew = γ (W · γ (nu ) + b) (30)
3.2.2 Single-interaction Updater. For any node u in the knowledge graph, the single-interaction
updater takes both u’s context representation nu and u’s current embedding zu as input to compute
a new representation zunew = f (nu , zu ). f is a function that involves a binary operator such as sum,
concatenation, etc, which is applied to both nu and zu . This operator builds an interaction between
the node u and its context, and could potentially improve the model performance.
KGNN-LS [30] and KGCN [31] both utilize summation as the interaction operator. The only
difference is that KGNN-LS applies a scaling operation on the input node embeddings before feeding
them to the operator. Their updaters can be written as
zunew = γ W · (λzu + nu ) + b)
(31)
where λ is the scaling factor and b is a bias that could be set to 0.
RecoGCN [41] and PinSage [42] replace the summation operator in the above updaters with the
concatenation operation. It improves the expressiveness of the updater by doubling the learnable
parameters, but also increases the computation cost.
zunew = ReLU W · (zu ||nu ) + b)
(32)
3.2.3 Multi-interaction Updater. The multi-interaction updater is a combination of multiple
single-interaction updaters with different binary operators, i.e., zunew = д f 1 (nu , zu ), f 2 (nu , zu ), . . . .
Here д is a function to fuse the representations obtained from different single-interaction updaters.
This type of updaters improves feature interactions and may lead to better performance.
The only method in current literature that utilizes multi-interaction updaters is KGAT [33]. It con-
siders two different operators: summation and element-wise product. The mathematical formulation
of this updater is
zunew = LeakyReLU W1 (zu + nu ) + LeakyReLU W2 (zu ⊙ nu )
(33)
The authors choose element-wise product because it makes the information being propagated in the
graph sensitive to the affinity between zu and nu , e.g., passing more information from similar nodes.
3.2.4 Discussion: Since the information of the target node u is missing in the inputs of context-
only updaters, they may not be sufficient to model the interaction between u and its context, which
limits the quality of learned representations. Single-interaction updaters address this issue by manually
encoding the feature interactions between zu and nu . Multi-interaction updaters further improve
it by considering different binary operators at a time. In general, more attention is needed on the
multi-interaction updaters, especially for the fusion function д. For example, instead of using a simple
operation like summation, is it possible to learn a fusion function д?
4.1.1 Uniform Term Embedding. MEIRec [4] proposed by Fan et al. is an intent recommendation
framework that recommends the most relevant intent (i.e., query) to a user based on his/her click
history. The authors find that queries and titles of items are constituted by terms, and the number
of terms are not many. So they propose to represent the queries/titles with a small number of term
embeddings. It significantly reduces the number of training parameters in the model. The new
queries/items that have never been searched/added before can also be represented by these terms.
4.2 Scalability
In most existing graph neural networks, their aggregators have to visit the full neighborhood of a
node to produce its node embedding, which is computationally intractable for large-scale knowledge
graphs. In the following, we show some solutions proposed by the studied methods to this scalability
issue.
4.2.1 Important Node Sampling. Instead of examining k-hop graph neighborhood to compute
node embeddings, PinSage [42] defines importance-based neighborhoods, where the neighborhood
of a node u is defined as the T nodes that exert the most influence on node u. Specifically, it simulates
random walks starting from u and computes the L 1 -normalized visit count for nodes visited by
random walks. The neighborhood of u is hence the top T nodes with the highest normalized visit
counts.
4.2.2 Meta-path Defined Receptive Field. MEIRec [4] and RecoGCN [41] propose to leverage
the semantic-aware meta paths to carve out concise and relevant receptive fields for each node, which
is referred as meta-path defined receptive field.
Definition 4.1. Meta-path [41]. A meta-path ρ is defined as a path in a knowledge graph in the
r1 r2 rl
form of t 1 −→ t 2 −→ · · · −→ tl +1 , where there is a composite relation R = r 1 ◦ r 2 ◦ · · · ◦ rl between
node type t 1 and tl +1 .
Definition 4.2. Meta-path defined Receptive Field (MRF) [41]. Given a knowledge graph G =
ρ
(V, E), for a node v and
ρ ρ ρ a meta-path ρ of length l, a meta-path defined receptive field Fv =
fv (0), fv (1), · · · , fv (l) is defined as the set of nodes that can be travelled to or passed by from
ρ
node v via the meta-path ρ, where fv (k) denotes the set of nodes reached by k jumps on ρ.
Figure 6 shows an example of MRF. Compared to k-hop graph neighborhood, MRF focuses only
on the relations selected based on prior knowledge, and accelerates the training by greatly reducing
the number of nodes in the computation graph. Moreover, it is possible to enlarge the receptive field
of a node u by computing multiple embeddings along different meta-paths and fusing them together
to obtain a final representation:
zunew = д(zu,
new new new
ρ 1 , zu, ρ 2 , . . . , zu, ρ K ) (34)
where д is a fusion function. Note that the computation of these embeddings are independent from
each other, and hence can be done in parallel.
4.2.3 Vector-wise Aggregator and Updater. To remove the limitation of training on clustered
mini-graphs for large-scale graphs, IntentGC [46] introduces a special graph neural network, which
replaces the normal aggregator and updater in GNNs with a vector-wise aggregator and a vector-wise
updater. The key idea is to avoid unnecessary feature interactions by replacing the expensive matrix
multiplication between a huge weight matrix and a giant feature vector (formed by concatenation
of many vectors) with a summation of multiple small matrix products. The detailed formulation is
shown in Table 3.
4.3 Personalization
4.3.1 Graph Translation. KGNN-LS [30] performs personalized recommendation by converting
the knowledge graph into a user-specific weighted graph, on which a graph neural network is applied
# Relation
Scenario Dataset # Entities # Connections Citation
Types
Amazon-books 95,594 847,733 39 [28], [46]
Book
Book-crossing 25,787 60,787 18 [30], [31]
Citation S2 - - - [40]
Douban 46,423 331,315 5 [23], [45]
Movie Flixter 1,049,000 26,700,000 - [45]
MovieLens 102,569 499,474 32 [17], [25], [30], [45]
Music Last-FM 9,336 15,518 60 [30], [33]
Delicious 5,932 15,328 - [37]
POI Yelp 159,426 6,818,026 6 [23], [25]
Dianping 28,115 160,519 7 [30]
Social Network Epinions 175,000 508,000 - [6], [36]
to compute personalized item embeddings. The weights of edges on this new graph are computed by
a trainable function f (ze ) that identifies important knowledge graph relationships for a given user.
We refer this technique as graph translation.
4.4 Dynamicity
Knowledge graph could evolve over time dynamically, i.e, its nodes/edges may appear or disappear.
For example, in the scenario of social recommendation, a user’s friend list may change from time
to time. When a user adds a number of new friends with similar interests, the recommender system
should update its recommendation strategy accordingly and reflect this change in its results.
To address this issue, Wu et al. [23] consider a dynamic feature graph setting. Specifically, for
each user, they construct a graph where each node represents either this user or one of his/her friends.
If user u has |N (u)| friends, then the total number of nodes in this graph is |N (u) + 1|. The node
feature of friends in this graph is kept unchanged but that of user u is updated whenever u consumes
a new item. Moreover, to capture the context-dependent social influence, the authors propose a graph
attention neural network, which utilizes an attention mechanism to guide the influence propogation
in its aggregator. Each friend of user u is assigned with an attention weight which measures its level
of influence.
6 FUTURE DIRECTIONS
Although existing works have established a solid foundation for GNN-based Knowledge-Aware
Deep Recommender (GNN-KADR) systems, it is still a young and promising research field. In this
section, we suggest several prospective future research directions.
6.2 Dynamicity
Knowledge graphs are in nature dynamic in a way that nodes and edges may appear or disappear,
and that node/edge inputs may change time by time [38]. Moreover, the types of relations between
nodes in a KG may also change along time in real-world scenarios. For example, in agent-initialized
social e-commerce, a user may become a selling agent to his friends someday in the future. Although
DGRec proposed by Wu et al. [23] has partially addressed the dynamicity of graph, few of the
GNN-KADR systems consider accommodating their frameworks to cases where the knowledge
graph contains dynamic spatial and semantic relations. New aggregators and updaters are needed to
adapt to the dynamicity of knowledge graphs. According to our literature survey, we argue that this
is still an open area for future research.
introduced by the knowledge graph. Therefore, building a fair GNN-KADR system is a promising
but largely-unexplored area where more studies are expected.
7 CONCLUSION
In this survey, we conduct a comprehensive overview of GNN-based Knowledge Aware Deep
Recommender (GNN-KADR) systems. We provide a taxonomy for their core component, i.e., the
graph embedding module, which is usually a GNN in the state-of-the-art frameworks. According
to the taxonomy, we categorize the aggregators of these GNNs into three groups: relation-unaware
aggregator, relation-aware subgraph aggregator and relation-aware attentive aggregator. We also
divide their updaters into three categories: context-only updater, single-interaction updater, and
multi-interaction updater. We provide a thorough review, comparisons and summarizations of these
systems within or between categories. Then, we discuss the solution proposed by these frameworks
to the common practical recommendation issues such as cold-start, scalability and so on. Datasets,
open-source codes and benchmarks for GNN-KADR systems are also summarized. Finally, we
suggest future research directions in this rapidly growing field.
REFERENCES
[1] Ayse Merve Acilar and Ahmet Arslan. 2009. A collaborative filtering method based on artificial immune network.
Expert Syst. Appl. 36, 4 (2009), 8324–8332. https://doi.org/10.1016/j.eswa.2008.10.029
[2] Alex Beutel, Jilin Chen, Tulsee Doshi, Hai Qian, Li Wei, Yi Wu, Lukasz Heldt, Zhe Zhao, Lichan Hong, Ed H. Chi, and
Cristos Goodrow. 2019. Fairness in Recommendation Ranking through Pairwise Comparisons. In Proceedings of the
25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK,
USA, August 4-8, 2019. 2212–2220. https://doi.org/10.1145/3292500.3330745
[3] Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. In
Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, September 15-19, 2016.
191–198. https://doi.org/10.1145/2959100.2959190
[4] Shaohua Fan, Junxiong Zhu, Xiaotian Han, Chuan Shi, Linmei Hu, Biyu Ma, and Yongliang Li. 2019. Metapath-
guided Heterogeneous Graph Neural Network for Intent Recommendation. In Proceedings of the 25th ACM SIGKDD
International Conference on Knowledge Discovery & Data Mining. 2478–2486.
[5] Shaohua Fan, Junxiong Zhu, Xiaotian Han, Chuan Shi, Linmei Hu, Biyu Ma, and Yongliang Li. 2019. Metapath-
guided Heterogeneous Graph Neural Network for Intent Recommendation. In Proceedings of the 25th ACM SIGKDD
International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019.
2478–2486. https://doi.org/10.1145/3292500.3330673
[6] Wenqi Fan, Yao Ma, Qing Li, Yuan He, Eric Zhao, Jiliang Tang, and Dawei Yin. 2019. Graph neural networks for social
recommendation. In The World Wide Web Conference. 417–426.
[7] Golnoosh Farnadi, Pigi Kouki, Spencer K. Thompson, Sriram Srinivasan, and Lise Getoor. 2018. A Fairness-aware
Hybrid Recommender System. CoRR abs/1809.09030 (2018). arXiv:1809.09030 http://arxiv.org/abs/1809.09030
[8] Aleksandr Farseev, Ivan Samborskii, Andrey Filchenkov, and Tat-Seng Chua. 2017. Cross-domain recommendation via
clustering on multi-layer graphs. In Proceedings of the 40th International ACM SIGIR Conference on Research and
Development in Information Retrieval. 195–204.
[9] Sahin Cem Geyik, Stuart Ambler, and Krishnaram Kenthapadi. 2019. Fairness-Aware Ranking in Search & Recommen-
dation Systems with Application to LinkedIn Talent Search. In Proceedings of the 25th ACM SIGKDD International
Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019. 2221–2231.
https://doi.org/10.1145/3292500.3330691
[10] Xiaowen Huang, Quan Fang, Shengsheng Qian, Jitao Sang, Yan Li, and Changsheng Xu. 2019. Explainable Interaction-
driven User Modeling over Knowledge Graph for Sequential Recommendation. In Proceedings of the 27th ACM
International Conference on Multimedia, MM 2019, Nice, France, October 21-25, 2019. 548–556. https://doi.org/10.
1145/3343031.3350893
[11] Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. 2010. Recommender systems: an
introduction. Cambridge University Press.
[12] Zhuoren Jiang, Yue Yin, Liangcai Gao, Yao Lu, and Xiaozhong Liu. 2018. Cross-language citation recommendation
via hierarchical representation learning on heterogeneous graph. In The 41st International ACM SIGIR Conference on
Research & Development in Information Retrieval. 635–644.
[13] Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv
preprint arXiv:1609.02907 (2016).
[14] Jurek Leonhardt, Avishek Anand, and Megha Khosla. 2018. User Fairness in Recommender Systems. In Companion of
the The Web Conference 2018 on The Web Conference 2018, WWW 2018, Lyon , France, April 23-27, 2018. 101–102.
https://doi.org/10.1145/3184558.3186949
[15] Mengmeng Li, Tian Gan, Meng Liu, Zhiyong Cheng, Jianhua Yin, and Liqiang Nie. 2019. Long-tail Hashtag Rec-
ommendation for Micro-videos with Graph Convolutional Network. In Proceedings of the 28th ACM International
Conference on Information and Knowledge Management. 509–518.
[16] Weizhi Ma, Min Zhang, Yue Cao, Woojeong Jin, Chenyang Wang, Yiqun Liu, Shaoping Ma, and Xiang Ren. 2019.
Jointly Learning Explainable Rules for Recommendation with Knowledge Graph. In The World Wide Web Conference,
WWW 2019, San Francisco, CA, USA, May 13-17, 2019. 1210–1221. https://doi.org/10.1145/3308558.3313607
[17] Enrico Palumbo, Giuseppe Rizzo, and Raphaël Troncy. 2017. Entity2rec: Learning user-item relatedness from knowledge
graphs for top-n item recommendation. In Proceedings of the eleventh ACM conference on recommender systems. 32–36.
[18] Phillip E. Pope, Soheil Kolouri, Mohammad Rostami, Charles E. Martin, and Heiko Hoffmann. 2019. Explainability
Methods for Graph Convolutional Neural Networks. In IEEE Conference on Computer Vision and Pattern Recognition,
CVPR 2019, Long Beach, CA, USA, June 16-20, 2019. 10772–10781. https://doi.org/10.1109/CVPR.2019.01103
[19] Sungyong Seo, Jing Huang, Hao Yang, and Yan Liu. 2017. Interpretable Convolutional Neural Networks with Dual Local
and Global Attention for Review Rating Prediction. In Proceedings of the Eleventh ACM Conference on Recommender
Systems, RecSys 2017, Como, Italy, August 27-31, 2017. 297–305. https://doi.org/10.1145/3109859.3109890
[20] Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, and Philip S. Yu. 2017. A Survey of Heterogeneous Information
Network Analysis. IEEE Trans. Knowl. Data Eng. 29, 1 (2017), 17–37. https://doi.org/10.1109/TKDE.2016.2598561
[21] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Deep Inside Convolutional Networks: Visualising
Image Classification Models and Saliency Maps. In 2nd International Conference on Learning Representations, ICLR
2014, Banff, AB, Canada, April 14-16, 2014, Workshop Track Proceedings. http://arxiv.org/abs/1312.6034
[22] Joan E. Solsman. 2018. Ever get caught in an unexpected hourlong YouTube binge? Thank YouTube AI for that.
https://www.cnet.com/news/youtube-ces-2018-neal-mohan/
[23] Weiping Song, Zhiping Xiao, Yifan Wang, Laurent Charlin, Ming Zhang, and Jian Tang. 2019. Session-Based
Social Recommendation via Dynamic Graph Attention Networks. In Proceedings of the Twelfth ACM International
Conference on Web Search and Data Mining, WSDM 2019, Melbourne, VIC, Australia, February 11-15, 2019. 555–563.
https://doi.org/10.1145/3289600.3290989
[24] Zhu Sun, Jie Yang, Jie Zhang, Alessandro Bozzon, Long-Kai Huang, and Chi Xu. 2018. Recurrent knowledge graph
embedding for effective recommendation. In Proceedings of the 12th ACM Conference on Recommender Systems,
RecSys 2018, Vancouver, BC, Canada, October 2-7, 2018. 297–305. https://doi.org/10.1145/3240323.3240361
[25] Zhu Sun, Jie Yang, Jie Zhang, Alessandro Bozzon, Long-Kai Huang, and Chi Xu. 2018. Recurrent knowledge graph
embedding for effective recommendation. In Proceedings of the 12th ACM Conference on Recommender Systems.
297–305.
[26] Yi Tay, Luu Anh Tuan, and Siu Cheung Hui. 2018. Latent Relational Metric Learning via Memory-based Attention for
Collaborative Ranking. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, WWW 2018, Lyon,
France, April 23-27, 2018. 729–739. https://doi.org/10.1145/3178876.3186154
[27] Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph
Attention Networks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada,
April 30 - May 3, 2018, Conference Track Proceedings. https://openreview.net/forum?id=rJXMpikCZ
[28] Chengwei Wang, Tengfei Zhou, Chen Chen, Tianlei Hu, and Gang Chen. 2019. CAMO: A Collaborative Ranking
Method for Content Based Recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33.
5224–5231.
[29] Hongwei Wang, Fuzheng Zhang, Xing Xie, and Minyi Guo. 2018. DKN: Deep knowledge-aware network for news
recommendation. In Proceedings of the 2018 world wide web conference. 1835–1844.
[30] Hongwei Wang, Fuzheng Zhang, Mengdi Zhang, Jure Leskovec, Miao Zhao, Wenjie Li, and Zhongyuan Wang. 2019.
Knowledge-aware graph neural networks with label smoothness regularization for recommender systems. In Proceedings
of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 968–977.
[31] Hongwei Wang, Miao Zhao, Xing Xie, Wenjie Li, and Minyi Guo. 2019. Knowledge Graph Convolutional Networks for
Recommender Systems. In The World Wide Web Conference, WWW 2019, San Francisco, CA, USA, May 13-17, 2019.
3307–3313. https://doi.org/10.1145/3308558.3313417
[32] Xiang Wang, Xiangnan He, Yixin Cao, Meng Liu, and Tat-Seng Chua. 2019. KGAT: Knowledge Graph Attention
Network for Recommendation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge
Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019. 950–958. https://doi.org/10.1145/
3292500.3330989
[33] Xiang Wang, Xiangnan He, Yixin Cao, Meng Liu, and Tat-Seng Chua. 2019. Kgat: Knowledge graph attention network
for recommendation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &
Data Mining. 950–958.
[34] Xiang Wang, Xiangnan He, Liqiang Nie, and Tat-Seng Chua. 2017. Item silk road: Recommending items from
information domains to social users. In Proceedings of the 40th International ACM SIGIR conference on Research and
Development in Information Retrieval. 185–194.
[35] Xiang Wang, Dingxian Wang, Canran Xu, Xiangnan He, Yixin Cao, and Tat-Seng Chua. 2019. Explainable Reasoning
over Knowledge Graphs for Recommendation. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI
2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI
Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 -
February 1, 2019. 5329–5336. https://doi.org/10.1609/aaai.v33i01.33015329
[36] Qitian Wu, Hengrui Zhang, Xiaofeng Gao, Peng He, Paul Weng, Han Gao, and Guihai Chen. 2019. Dual graph attention
networks for deep latent representation of multifaceted social effects in recommender systems. In The World Wide Web
Conference. 2091–2102.
[37] Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tieniu Tan. 2019. Session-based recommendation
with graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 346–353.
[38] Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. 2019. A Comprehensive
Survey on Graph Neural Networks. CoRR abs/1901.00596 (2019). arXiv:1901.00596 http://arxiv.org/abs/1901.00596
[39] Jun Xiao, Hao Ye, Xiangnan He, Hanwang Zhang, Fei Wu, and Tat-Seng Chua. 2017. Attentional Factorization
Machines: Learning the Weight of Feature Interactions via Attention Networks. In Proceedings of the Twenty-Sixth
International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017.
3119–3125. https://doi.org/10.24963/ijcai.2017/435
[40] Chenyan Xiong, Russell Power, and Jamie Callan. 2017. Explicit semantic ranking for academic search via knowledge
graph embedding. In Proceedings of the 26th international conference on world wide web. 1271–1279.
[41] Fengli Xu, Jianxun Lian, Zhenyu Han, Yong Li, Yujian Xu, and Xing Xie. 2019. Relation-Aware Graph Convolutional
Networks for Agent-Initiated Social E-Commerce Recommendation. In Proceedings of the 28th ACM International
Conference on Information and Knowledge Management. 529–538.
[42] Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph con-
volutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD International
Conference on Knowledge Discovery & Data Mining. 974–983.
[43] Zhitao Ying, Dylan Bourgeois, Jiaxuan You, Marinka Zitnik, and Jure Leskovec. 2019. GNNExplainer: Generating
Explanations for Graph Neural Networks. In Advances in Neural Information Processing Systems 32: Annual Conference
on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada. 9240–
9251. http://papers.nips.cc/paper/9123-gnnexplainer-generating-explanations-for-graph-neural-networks
[44] Jianming Zhang, Zhe L. Lin, Jonathan Brandt, Xiaohui Shen, and Stan Sclaroff. 2016. Top-Down Neural Attention
by Excitation Backprop. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands,
October 11-14, 2016, Proceedings, Part IV. 543–559. https://doi.org/10.1007/978-3-319-46493-0_33
[45] Jiani Zhang, Xingjian Shi, Shenglin Zhao, and Irwin King. 2019. STAR-GCN: Stacked and Reconstructed Graph
Convolutional Networks for Recommender Systems. In Proceedings of the Twenty-Eighth International Joint Conference
on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019. 4264–4270. https://doi.org/10.24963/ijcai.
2019/592
[46] Jun Zhao, Zhou Zhou, Ziyu Guan, Wei Zhao, Wei Ning, Guang Qiu, and Xiaofei He. 2019. IntentGC: a Scalable Graph
Convolution Framework Fusing Heterogeneous Information for Recommendation. In Proceedings of the 25th ACM
SIGKDD International Conference on Knowledge Discovery & Data Mining. 2347–2357.