[go: up one dir, main page]

0% found this document useful (0 votes)
171 views21 pages

Knowledge Graph Recommender

Uploaded by

Stanislav Se
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
171 views21 pages

Knowledge Graph Recommender

Uploaded by

Stanislav Se
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Deep Learning on Knowledge Graph for Recommender

System: A Survey
YANG GAO* and YI-FAN LI* , University of Texas at Dallas
arXiv:2004.00387v1 [cs.IR] 25 Mar 2020

YU LIN, University of Texas at Dallas


HANG GAO, University of Maryland Baltimore County
LATIFUR KHAN, University of Texas at Dallas
Recent advances in research have demonstrated the effectiveness of knowledge graphs (KG) in providing
valuable external knowledge to improve recommendation systems (RS). A knowledge graph is capable of
encoding high-order relations which connect two objects with one or multiple related attributes. With the
help of the emerging Graph Neural Networks (GNN), it is possible to extract both object characteristics and
relations from KG, which is an essential factor for successful recommendations. In this paper, we provide a
comprehensive survey of the GNN-based knowledge-aware deep recommender systems. Specifically, we discuss
the state-of-the-art frameworks with a focus on their core component, i.e., the graph embedding module, and how
they address practical recommendation issues such as scalability, cold-start and so on. We further summarize the
commonly-used benchmark datasets, evaluation metrics as well as the open-source codes. Finally, we conclude
the survey and propose potential research directions in this rapidly growing field.
Additional Key Words and Phrases: Knowledge Graph, Graph Neural Network (GNN), Recommender System
ACM Reference Format:
Yang Gao, Yi-Fan Li, Yu Lin, Hang Gao, and Latifur Khan. 2020. Deep Learning on Knowledge Graph for
Recommender System: A Survey. 1, 1 (April 2020), 21 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn

1 INTRODUCTION
In modern life, recommender systems (RS) are critical for users to make proper choices from the
huge amount of products or services. For example, people are lured in AI-driven recommendation
service for more than 75% of the time they spent on watching YouTube videos [22]. These systems
attempt to learn the users’ interests and keep them away from over-choice, which significantly boosts
their decision making process [11]. They also help to promote services and sales for business such as
e-commerce.
There are two main architectures for traditional recommender systems: content-based system and
collaborative-filtering based system. Content-based systems accept data that can be represented as
vectors in an euclidean space [38] and measure their similarities based on these vectors. Collaborative-
filtering based systems usually assume that each user-item interaction is an independent instance
with side information encoded [32]. However, in modern society, there is an increasing number of
applications where data are generated from non-Euclidean domains and are represented in the form
* Both authors contribute equally to this research.

Authors’ addresses: Yang Gao, yxg122530@utdallas.edu; Yi-Fan Li, yli@utdallas.edu, University of Texas at Dallas;
Yu Lin, University of Texas at Dallas, yxl163430@utdallas.edu; Hang Gao, University of Maryland Baltimore County,
hanggao1@umbc.edu; Latifur Khan, University of Texas at Dallas, lkhan@utdallas.edu.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full
citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting
with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee. Request permissions from permissions@acm.org.
© 2020 Association for Computing Machinery.
XXXX-XXXX/2020/4-ART $15.00
https://doi.org/10.1145/nnnnnnn.nnnnnnn

, Vol. 1, No. 1, Article . Publication date: April 2020.


2 Yang Gao, Yi-Fan Li, Yu Lin, Hang Gao, and Latifur Khan

Fig. 1. Illustration of user-item interactions and knowledge graph on YouTube, which contains users,
movies, directors and genres as objects; liking, categorizing, directing and writing as object relations

of Knowledge Graphs (KG). This data structure breaks the independent interaction assumption by
linking items with their attributes. For example, in YouTube RS (show in Fig. 1), the connections
between users and movies might be actions such as views, likes and comments. On the other hand,
movies may share the same genre, director, and etc. With these information, a knowledge-aware
RS is hence able to capture not only the user-item interactions but also the rich item-item/user-user
relations to make more accurate recommendations. It is worth noticing that nodes in the KG (e.g.,
person or movies) may have distinct neighborhood size (i.e., number of neighboring nodes), and the
relationships between them could vary as well, which makes recommendation with KG even more
challenging.
Recently, there is an increasing interest in extending deep learning approaches for graph data.
Motivated by CNNs, RNNs, and autoencoders from deep learning, new neural network architectures
have been rapidly developed over the past few years to handle the complexity of graph data. These
types of networks are referred as Graph Neural Networks (GNNs) [38] and are critical for recent
success of knowledge-aware deep recommender systems.
There are a number of existing reviews on the topic of graph neural networks and knowledge
graph analysis. Wu et al. [38] give an overview of graph neural networks in data mining and
machine learning fields. Although it is the first comprehensive review of GNNs, this survey mainly
focuses on the network architectures and only briefly mentions the possibility of applying GNNs for
recommendation. On the other hand, Shi et al. [20] summarize the recommender systems utilizing
the traditional knowledge graph analysis methods. But it misses the most recent development of
GNN-based algorithms. To our best knowledge, our survey is the first review of state-of-the-art
deep recommender systems that utilize GNNs to extract information from knowledge graphs for
improving recommendation.
Our Contributions: Our paper makes notable contributions summarized as follows:
• New Taxonomy: The core component of knowledge-aware deep recommender systems is the
graph embedding module, which is usually a GNN in the state-of-the-art frameworks. Each
layer of a GNN consists of two basic components: Aggregator and Updater. We categorize the
aggregators into three groups: relation-unaware aggregator, relation-aware subgraph aggregator,
and relation-aware attentive aggregator. We also divide the updaters into three categories, i.e.,
context-only updater, single-interaction updater and multi-interaction updater.

, Vol. 1, No. 1, Article . Publication date: April 2020.


Deep Learning on Knowledge Graph for Recommender System: A Survey 3

Users Items
Tom Jack

Buys Buys Buys Is Is


Brand Brand

In

Prefers Visits Searches Has Shop Belongs


Prefers Is
Clicks

Query In
Properties Shop Properties Category
Words Has

Visits Searches Has Belongs


Prefers
Co-occurrence

Prefers Prefers
Category Co-occurrence

Mariam Bob

Fig. 2. Example of a knowledge graph on e-commerce websites.

• Comprehensive Overview: We provide the most comprehensive overview of state-of-the-


art GNN-based Knowledge Aware Deep Recommender (GNN-KADR) systems. We provide
detailed descriptions on representative models, make the necessary comparison, and discuss
their solution to practical recommendation issues such as cold start, scalability and so on.
• Resource Collection: We summarize resources regarding GNN-KADR systems, including
benchmark datasets, evaluation metrics and open-source codes.
• Future Directions: We analyze the limitations of existing works and suggest a few future
research directions such as dynamicity, interpretability, fairness and so on.
Organization of Our Survey: The rest of the survey is organized as follows. Section 2 defines
the concepts related to GNN-based knowledge aware recommendations and lists the notations used
in this paper. Section 3 provides an overview of state-of-the-art GNN-KADR systems. Section 4
discusses their solutions to practical recommendation issues. Section 5.1 outlines the widely used
benchmark datasets, evaluation metrics, and open-source codes. Section 6 discusses the current
challenges and suggests future research directions. Section 7 summarizes the paper.

2 PRELIMINARY AND NOTATION


First, let us consider a typical scenario on a e-commerce website: Jack is a customer who prefers to
wear blue jeans and has queried about “jeans” one day. From the list of returned items, Jack clicked
some attractive items for detailed information. During this week, he also visited some online shops
for checking out T-shirts. Finally, on Sunday, Jack purchased a blue jean from his favorite brand
as a birthday gift and added another T-shirt in the same shop to his shopping cart. Based on Jack’s
behavior, the platform has collected rich information (submitted query words, clicked items, visited
shops, preferred properties and brands) for recommending potential interesting items to him in the
future. This kind of recommendation scenarios could also be observed on other websites. In general,
multiple kinds of objects and historical user behaviors form a knowledge graph. Figure 2 shows a toy
example on e-commerce websites.
Definition 2.1. Knowledge Graph (KG) [24] is defined as a directed graph G = (V, E), where
V is the set of nodes and E ⊆ V × V is the set of edges between nodes in V. G is associated with
a node type mapping function ϕ: V → A and an edge type mapping function ψ : E → R, where
|A| > 1 and/or |R| > 1. Each node v ∈ V belongs to one particular node type in the node type

, Vol. 1, No. 1, Article . Publication date: April 2020.


4 Yang Gao, Yi-Fan Li, Yu Lin, Hang Gao, and Latifur Khan

Fig. 3. The general workflow of a GNN-based Knowledge Aware Deep Recommender (GNN-KADR)
system.

set A: ϕ(v) ∈ A, and each edge e ∈ E belongs to a particular relation type in relation type set R:
ψ (e) ∈ R. Mining knowledge graphs is usually based on a basic entity-relation-entity triplet (u, e, v),
where u ∈ V, e ∈ E, v ∈ V denote the head, relation and tail of this triplet. In this paper, we refer
these entity-relation-entity triplets as knowledge triplets for simplicity. Here the types of nodes u and
v could be either same or different depending on the context.
Definition 2.2. Neighborhood N (v). Given a knowledge graph G = (V, E), for a node v, its neigh-
borhood N (v) is defined as the set of nodes that directly connect to v, i.e., {w |(w, e, u) or (u, e, w), e ∈
E}.
Definition 2.3. r -Neighborhood Nr (v). Given a knowledge graph G = (V, E), for a node v, its
r -neighborhood Nr (v) is defined as the set of nodes that connect to v with edges of type r , i.e.,
{w |(w, e, u) or (u, e, w), where e ∈ E and ψ (e) = r }.
User-Item Recommendation In general, V can be further written as V = V1 ∪ V2 ∪ · · · ∪ Vi ∪ · · · ∪
Vn , where Vi denotes the set of nodes with type i and n = |A|. In recommendation we denote V1 as
the set of user nodes and V2 as the set of item nodes, with V3 , . . . , Vn representing the other objects’
nodes (brand, properties, etc). We also denote E = El abel ∪ Eunl abel , where El abel ⊆ V1 × V2
represents the set of edges between user and item nodes, and Eunl abel = E \ El abel represents other
edges. Since a typical recommendation setting in real world is to predict a user’s preferred items
based on his historical behaviors, we use G = (V, E) to denote the graph constructed from historical
data, and G P = (V P , E P ) to denote the graph of real future. The user-item recommendation problem
can hence be formulated as a link prediction problem on graph:
Input: A KG G = (V, E) constructed based on historical data.
Output: A predicted edge set E bp , which is the prediction of the real edge set E P on G P .
l abel l abel

Throughout this paper, the bold uppercase characters are used to denote matrices and bold
lowercase characters are used to denote vectors. Unless particularly specified, the notations used in
this paper are illustrated in Table 1.

3 CATEGORIZATION AND FRAMEWORKS


The general workflow of a GNN-based Knowledge Aware Deep Recommender (GNN-KADR)
system is shown in Figure 3. The graph embedding module of GNN-KADR first learns to produce an
embedding for every graph node (including the user and item nodes), which encodes the information
distilled from the input knowledge graph. Next, for a given user, the ranking module computes a

, Vol. 1, No. 1, Article . Publication date: April 2020.


Deep Learning on Knowledge Graph for Recommender System: A Survey 5

Table 1. Commonly used notations.

Notations Descriptions
|·| The size of a set.
⊙ Element-wise product.
|| Concatenation
G A knowledge graph.
V The set of nodes in a knowledge graph.
E The set of edges between nodes in V.
A The node type set.
R The edge relation type set.
v A node v ∈ V.
ei, j An edge ei, j ∈ E.
(u, e, v) A knowledge triplet where u ∈ V, e = eu,v ∈ E, v ∈ V denote the head, relation and tail of this triplet.
zv ∈ Rd The feature vector of node v
ze ∈ Rc The feature vector of edge e = eu,v or ev,u
zei, j ∈ Rc The feature vector of edge ei, j
nu ∈ Rk The context representation of node u
nur ∈ Rk Type-r context representation of node u
X∈ R N ×d The feature matrix of nodes in a knowledge graph.
A ∈ R N ×N The adjacency matrix of a knowledge graph
N ∈ R N ×k The context representations of all nodes in a knowledge graph
N (v) The neighborhood of a node v
Nr (v) r -Neighborhood of a node v.
n The number of nodes, n = |V |.
m The number of edges, m = |E |.
d The dimension of a node feature vector.
c The dimension of an edge feature vector
k The dimension of v’s context representation.
σ (·) The sigmoid activation function
β(·) The LeakyReLU activation function
γ (·) An activation function, e.g., sigmoid, ReLU, etc.
W, Θ, b Learnable model parameters.

match score between this user and each of the candidate items according to their corresponding
embeddings. Those items with top-N match scores (or match scores above a user-specified threshold)
are linked (recommended) to this user.
In this process, the core component of the system is the graph embedding module, which is usually
a Graph Neural Network (GNN) in the state-of-the-art frameworks. Each layer of a GNN consists of
two basic components: Aggregator and Updater. For a node v, the Aggregator aggregates the feature
information from the neighbors of v to produce a context representation. The Updater then utilizes
this context representation as well as other input information to obtain a new embedding for node v.
Stacking K different GNN layers or reusing the same GNN layer K times expands GNN’s receptive
field to K-hop graph neighborhoods.
We categorize the aggregators of GNNs into relation-unaware aggregator, relation-aware subgraph
aggregator and relation-aware attentive aggregator. On the other hand, we divide the updaters into

, Vol. 1, No. 1, Article . Publication date: April 2020.


6 Yang Gao, Yi-Fan Li, Yu Lin, Hang Gao, and Latifur Khan

Table 2. Taxonomy and representative publications of GNN-based Knowledge Aware Deep Recom-
mender (GNN-KADR) systems.

GNN Category Publications


Relation-unaware Aggregator [4], [42]
Aggregator Subgraph Aggregator [41], [45], [46]
Relation-aware Aggregator
Attentive Aggregator [6], [15], [23], [30], [31], [33], [36]
Context-only Updater [4], [6], [15], [23], [31], [36], [45]
Updater Single-interaction Updater [30], [31], [41], [42], [46]
Multi-interaction Updater [33]

Table 3. Summary of studied GNN-KADR systems.

Method Aggregator Updater Issues Solved


MEIRec [4] nu = Avд/LST M/CN N ({zv |v ∈ N (u)}) zunew = nu Scalability; Cold-Start
xu,v = MLP(zv ||ze )
∗ = WT · γ (W · (x
αu,v u,v ||zu ) + b1 ) + b2
GraphRec [6] 2 1
∗ )
exp(αu,v zunew = γ (W · nu + b) —
αu,v = Í ′ ∗
v ∈N (u) exp(α u,v ′ )
nu = v ∈N (u) αu,v xu,v
Í

Ai, j = fweiдht (X[i, :], X[j, :], ei, j )


V2HT [15] A
Ai,′ j = α · Í ′ i,Aji, j ′ if i , j else 1 − α Xnew = γ (NW) Data Sparsity, Multimodality
j
1 1
N = D̃′− 2 A′D̃′− 2 X, where D̃i,i ′ = Í A′
j i, j
exp(z ·z )
αu,v = Í u v
DGRec [23] Í v ∈N (v ) exp(zu ·zv ) zunew = ReLU (W · nu ) Dynamicity
nu = v ∈N (u) αu,v zv
A′ = A + I
Xnew = γ (N + D− 2 · I · D− 2 X) · W
1 1 
KGNN-LS [30] 1 1 Personalization
N = D− 2 AD− 2 X where Di,i = j Ai,′ j
Í

∗ = д(zv , ze ) new = γ W · (zu + nu ) + b



αu,v zu,sum
∗ )
exp(αu,v
KGCN [31] αu,v = Í ′ new = γ W · (zu ||nu ) + b —

∗ zu,concat
exp(αu,v ′)
Í v ∈N (u) new
nu = v ∈N (u) αu,v zv zu,cont ex t = γ W · nu + b


∗ = (W z )T tanh(W z + z )
αu,v e v e u e
zunew = β W1 · (zu + nu ) + β W2 · (zu ⊙ nu )
∗ )
exp(αu,v
 
KGAT [33] αu,v = Í ′ ∗

exp(αu,v ′)
Í v ∈N (u)
nu = v ∈N (u) αu,v zv
LeakyRe LU (w (W z ⊙(W z | |W z )))
u e e p u p v
αu,v = Í
DANSER [36] Í v ∈N (u) LeakyRe LU (wu (We ze ⊙(Wp zu | |Wp zv ))) zunew = γ (W · nu + b) Dynamicity
nu = v ∈N (u) αu,v zv
r =Ísoftmaxv {Wr1 zv · Wr2 zu |v ∈ Nr (u)}

αv,u
nur = ∀v ∈Nr (u) αv,u
r z zunew = ReLU W · (zu ||nu ) + b

RecoGCN [41] v Scalability
nu = r nur
Í

nu = Avд/Pool {ReLU (W1 zv + b)|v ∈ N (u)}, {αv } zunew = ReLU W2 · (zu ||nu ) + b
 
PinSage [42] Scalability
nur = 1
Wr zv
Í

zunew = γ W · γ (nu )

STAR-GCN [45] v ∈N r (u) |N r (u) | · |N r (v) | Cold-Start
Í r
nu = r nu
nur = Avд {zv |v ∈ Nr (u)}

IntentGC [46] zunew = γ (W · zu + nu ) Scalability
nu = r| R=1|−2 Wr nur
Í

three categories, i.e., context-only updater, single-interaction updater and multi-interaction updater.
Various categories of aggregators and updaters are described in Figure 4 and Figure 5 respectively. In
the following, we discuss them in details.

3.1 Aggregator
3.1.1 Relation-unaware Aggregator. For a target node u, a relation-unaware aggregator aims to
aggregate information from parts or all of u’s neighboring nodes to produce a context representation.
However, in this process, the relation r = ψ (e) (e = eu,v or ev,u ) between the target node u and any

, Vol. 1, No. 1, Article . Publication date: April 2020.


Deep Learning on Knowledge Graph for Recommender System: A Survey 7

Fig. 4. Description of relation-unaware aggregator, relation-aware subgraph aggregator, and relation-


aware attentive aggregator.

, Vol. 1, No. 1, Article . Publication date: April 2020.


8 Yang Gao, Yi-Fan Li, Yu Lin, Hang Gao, and Latifur Khan

Fig. 5. Description of context-only updater, single-interaction updater, and multi-interaction updater.

neighboring node v is ignored, and its information is hence not encoded in the context representation
nu .
Fan et al. [4] points out that existing methods used in industry for intent recommendation rely
on extensive laboring feature engineering and fail to utilize the rich interaction between users and
items, which limits the model performance. To solve these issues, they model the complex objects
(i.e., users and items with their attributes) as well as their interactions as a knowledge graph, and
present a framework called MEIRec to learn the object embeddings for recommendation. The GNN
in MEIRec generates a context representation for a target node u by
nu = д({zv |v ∈ N (u)}) (1)
where zv is the embedding of node v and д is a aggregate function that can be either average, LSTM
or CNN depending on the context. For example, they choose д to be average function if u is an item
node, since there is usually no priority among users that have clicked/purchased an item. On the
other hand, they choose LSTM as д if u represents a user node, because a user usually clicks items
with timestamp and its neighbors can be viewed as a sequence data.
Ying et.al. [42] introduce a data-efficient large-scale deep recommendation engine PinSage that is
developed and deployed at Pinterest. PinSage combines efficient random walks and GNN to generate
embeddings of nodes that incorporate both graph structure as well as node feature information. For
each target node u, PinSage first measures the importance of u’s neighboring nodes by simulating
random walks starting from u and computes the L 1 -normalized visit count of nodes. Then the context
representation nu is computed by
nu = Avд/Pool({ReLU (W1 zv + b)|v ∈ N (u)}, {αv }) (2)
where W1 , b are parameters to learn, and {αv } are L 1 -normalized visit counts of u’s corresponding
neighboring nodes.
3.1.2 Relation-aware Subgraph Aggregator. To handle different relations in the knowledge
graph, the relation-aware subgraph aggregators split the neighborhood graph into multiple subgraphs
such that all edges of a subgraph belong to only one of the relation types in R. Each subgraph is
assigned with an aggregator characterized by a unique set of parameters, and its information is
extracted by this aggregator to produce a relation-sensitive context representation. We use nur to
denote the representation generated from a subgraph with type r edges. Finally, {nur 1 , nur 2 , . . .} are
further fused together to produce the overall context representation for target node u. A typical
example is shown in Figure 4b.

, Vol. 1, No. 1, Article . Publication date: April 2020.


Deep Learning on Knowledge Graph for Recommender System: A Survey 9

The complex interactions in agent-initialized social e-commerce can be formulated as a knowledge


graph with numerous types of relations between three types of nodes, i.e., users, selling agents and
items. Xu et al. [41] propose a novel framework RecoGCN to effectively aggregate the heterogeneous
features in this knowledge graph. The “relation-aware aggregator” used in RecoGCN is:

r
= softmaxv {Wr1 zv · Wr2 zu |v ∈ Nr (u)}

αv,u (3)
nur = r
Õ
αv,u zv (4)
∀v ∈N r (u)

nur
Õ
nu = (5)
r
where Wr1 and Wr2 are two learnable transformations assigned to the relation type r ∈ R.
To boost the performance in recommender systems, The STAR-GCN architecture introduced by
Zhang et al. [45] employs a multi-link graph convolutional encoder to learn the node representations.
Each representation type r ∈ R is assigned with a specific transformation. Its aggregator is hence
formulated as:
1
nur = Wr zv
Õ
p (6)
v ∈N r (u) |N r (u)| · |N r (v)|
nur
Õ
nu = (7)
r
where {Wr |r ∈ R} are the parameters to learn.
IntentGC proposed by Zhao et al. [46] is a recommendation framework that captures both the
explicit user preference and heterogeneous relationships of auxiliary information. To perform large-
scale recommendation, IntentGC introduces a vector-wise aggregator:
nur = Avд {zv |v ∈ Nr (u)}

(8)

R |−2
nu = Wr nur (9)
r =1
This aggregator is computationally efficient since it avoids unnecessary computations by replacing
the expensive operation W · (nur 1 ||nur 2 || . . .) with the summation of multiple small matrix products.
3.1.3 Relation-aware Attentive Aggregator. In contrast to the other two types of aggregators,
relation-aware attentive aggregators convert the neighborhood graph into a weighted graph, where the
f (zu ,ze ,zv )
weight of each edge is a function of corresponding knowledge triplet, e.g., w = Í ′ ,
v ∈N (u) f (zu ,ze ′ ,zv ′ )
w = f (zu , ze , zv ), etc. These weights capture the rich semantic information encoded in the edges of a
knowledge graph, and measure the relatedness of different knowledge triplets to the target node u.
Wang et al. [30] propose an end-to-end framework KGNN-LS that utilizes a knowledge graph as
an addition information source to improve recommendation. For a given user, KGNN-LS applies a
trainable function to identify important knowledge graph relationships and converts the knowledge
graph into a user-specific weighted graph. The aggregator in KGNN-LS is hence formulated as:
A′ = A + I (10)
− 12 − 21
N = D AD X (11)
= f (zei, j ) and D is a diagonal matrix with Di,i = j Ai, j . f is a user-specific scoring
Í ′
where Ai, j
function.
Li et al. [15] introduce a novel framework named V2HT, which is capable of boosting the
performance of micro-video hashtag recommendation by jointly considering the sequential feature

, Vol. 1, No. 1, Article . Publication date: April 2020.


10 Yang Gao, Yi-Fan Li, Yu Lin, Hang Gao, and Latifur Khan

learning, the video-user-hashtag interaction, and the hashtag correlations. Similar to KGNN-LS, they
assign different weights on different types of edges to transform the knowledge graph into a weighted
graph. Specifically, the aggregator used in V2HT is:
Ai, j = fweiдht (X[i, :], X[j, :], ei, j ) (12)
A
(
α · Í ′ i,Aji, j ′ if i , j
Ai, j =
′ j (13)
1−α otherwise
1 1
N = D̃′− 2 A′D̃′− 2 X (14)
where the subscripts i and j denote the i t h and jt h node in the graph, and D̃ is a diagonal matrix with
D̃i,i = j Ai,′ j . Here α determines the weights assigned to a node itself and other correlated nodes.
Í
When α → 1, the feature of a node itself will be ignored; when α → 0, neighboring information will
be ignored.
Fan et al. [6] propose a novel graph neural network framework GraphRec for social recommenda-
tion. The knowledge graph in social recommendation usually consists of two graphs: a social graph
denoting the relationship between users and a user-item graph denoting interactions between users
and items. GraphRec introduces three aggregators to process these two different graphs, i.e., user
aggregator, item aggregator and social aggregator. The user and item aggregators extract information
from the user-item graph while the social aggregator distills information from the social graph. These
aggregators work in a similar way. Thus we only explain the item aggregator here for simplicity.
Mathematically, the item aggregator is formulated as:
xu,v = MLP(zv ||ze ) (15)

αu,v = WT2 · γ (W1 · (xu,v ||zu ) + b1 ) + b2 (16)
∗ )
exp(αu,v
αu,v = Í ∗ (17)
v ′ ∈N (u) exp(αu,v )
Õ
nu = αu,v xu,v (18)
v ∈N (u)
where αu,v represents the attention weight of the interaction with v in contributing u’s context
representation.
Recommendation in online communities faces two challenges: 1) users’ interest are dynamic, and
2) users are influenced by their friends. To address these challenges, Song et al. [23] introduce a
novel framework DGRec based on a dynamic graph attention neural network. Specifically, They
first model dynamic user behaviors with a recurrent neural network, and its outputs are used as
initial graph node embeddings. Then, a graph attention neural network is adopted to model context
dependent social influence, which dynamically infers the influencers based on user’s current interest.
The aggregator in this graph attention neural network is:
exp(zu · zv )
αu,v = Í (19)
v ∈N (v) exp(zu · zv )
Õ
nu = αu,v zv (20)
v ∈N (u)
The KGCN framework proposed by Wang et al. [31] is an end-to-end recommender system
that automatically discovers the high-order structure information and semantic information of
Knowledge Graph (KG). The key idea of KGCN is to aggregate and incorporate neighborhood
information with bias when calculating the representation for a given node in the knowledge graph.
It is implemented by weighting neighbors with scores dependent on the connecting relations and

, Vol. 1, No. 1, Article . Publication date: April 2020.


Deep Learning on Knowledge Graph for Recommender System: A Survey 11

nodes, which characterizes both the semantic information of KG and user’s personalized interests in
relations. Specifically, the aggregator in KGCN is:

αu,v = д(zv , ze ) (21)
∗ )
exp(αu,v
αu,v = Í ∗ (22)
v ′ ∈N (u) exp(αu,v ′ )
Õ
nu = αu,v zv (23)
v ∈N (u)
Wang et al. [33] propose a method named KGAT to explicitly model the high-order connectivities
in KG in an end-to-end fashion. The aggregator in KGAT works in a similar way as that of KGCN.
The only difference is that it takes all the three embeddings zu , ze , and zv of a knowledge triplet as
input to compute the attention score:

αu,v = (We zv )T tanh(We zu + ze ) (24)
∗ )
exp(αu,v
αu,v = Í ∗ (25)
v ′ ∈N (u) exp(αu,v ′ )
Õ
nu = αu,v zv (26)
v ∈N (u)
Here We is a projection matrix to learn, which transforms inputs from the node embedding space to
the edge embedding space.
In social recommendation, there are four different social effects in recommender systems, including
two-fold effects in user domain, i.e., a static effect from social homophily and a dynamic effect
from social influence, and the other symmetric two-fold ones in item domain. To capture these four
effects, Wu et al. [36] propose a framework named DANSER, which is composed of two dual Graph
ATtention networks (GAT) [27]: one dual GAT for users-including a GAT to capture social homophily
and a GAT to capture social influenceâĂŤand the other dual GAT for items. The aggregators in
DENSER work in a similar way. We show one of them as an example:
LeakyReLU (wu (We ze ⊙ (Wp zu ||Wp zv )))
αu,v = Í (27)
v ∈N (u) LeakyReLU (wu (We ze ⊙ (Wp zu ||Wp zv )))
Õ
nu = αu,v zv (28)
v ∈N (u)
where Wp , We , and wu are two weight matrices and one weight vector to learn.
3.1.4 Discussion: Compared to the other two types of aggregators, the relation-unaware aggrega-
tor loses the rich semantic information encoded in the edges of a Knowledge Graph (KG), which
limits the model performance. On the other hand, although the relation-aware subgraph aggregator is
able to model various semantic relations in KG, it has several limitations. First, as each relation type
r ∈ R is assigned with a unique set of parameters, the number of training parameters in this type of
aggregators significantly increases when the relation type set R becomes larger. Those methods utiliz-
ing relation-aware subgraph aggregators are hence not suitable for processing knowledge graphs with
numerous types of relations. Second, these aggregators process the subgraphs of different relation
types independently. The correlation between different relation types are hence not encoded in the
learned context representation nu , which limits the recommendation performance. The relation-aware
attentive aggregator utilizes a single function to compute attention scores for different relation types
in KG, and hence can be applied for large-scale knowledge graphs. Meanwhile, it also improves

, Vol. 1, No. 1, Article . Publication date: April 2020.


12 Yang Gao, Yi-Fan Li, Yu Lin, Hang Gao, and Latifur Khan

the explainability of deep recommender systems by weighting the contributions of different nodes
neighboring to the target node u.

3.2 Updater
3.2.1 Context-only Updater. For any node u in the knowledge graph, the context-only updater
only receives u’s context representation nu as input, and produces a new representation zunew = f (nu )
for this node. Here f is the updater function.
MEIRec [4] simply takes nu as the new embedding for node u, i.e., zunew = nu , which is com-
putational efficient but may not be optimal. To overcome this issue, GraphRec [6], V2HT [15],
DGRec [23], KGCN [31], and DANSER [36] propose to approximate f via a MLP:
zunew = γ (W · nu + b) (29)
where γ is a non-linear activation function such as ReLU, and the bias b could be set to 0. STAR-
GCN [45] improves the non-linearity of f by applying another activation function on nu before
sending it to the MLP layer:
zunew = γ (W · γ (nu ) + b) (30)
3.2.2 Single-interaction Updater. For any node u in the knowledge graph, the single-interaction
updater takes both u’s context representation nu and u’s current embedding zu as input to compute
a new representation zunew = f (nu , zu ). f is a function that involves a binary operator such as sum,
concatenation, etc, which is applied to both nu and zu . This operator builds an interaction between
the node u and its context, and could potentially improve the model performance.
KGNN-LS [30] and KGCN [31] both utilize summation as the interaction operator. The only
difference is that KGNN-LS applies a scaling operation on the input node embeddings before feeding
them to the operator. Their updaters can be written as
zunew = γ W · (λzu + nu ) + b)

(31)
where λ is the scaling factor and b is a bias that could be set to 0.
RecoGCN [41] and PinSage [42] replace the summation operator in the above updaters with the
concatenation operation. It improves the expressiveness of the updater by doubling the learnable
parameters, but also increases the computation cost.
zunew = ReLU W · (zu ||nu ) + b)

(32)
3.2.3 Multi-interaction Updater. The multi-interaction updater is a combination of multiple
single-interaction updaters with different binary operators, i.e., zunew = д f 1 (nu , zu ), f 2 (nu , zu ), . . . .
Here д is a function to fuse the representations obtained from different single-interaction updaters.
This type of updaters improves feature interactions and may lead to better performance.
The only method in current literature that utilizes multi-interaction updaters is KGAT [33]. It con-
siders two different operators: summation and element-wise product. The mathematical formulation
of this updater is
zunew = LeakyReLU W1 (zu + nu ) + LeakyReLU W2 (zu ⊙ nu )
 
(33)
The authors choose element-wise product because it makes the information being propagated in the
graph sensitive to the affinity between zu and nu , e.g., passing more information from similar nodes.
3.2.4 Discussion: Since the information of the target node u is missing in the inputs of context-
only updaters, they may not be sufficient to model the interaction between u and its context, which
limits the quality of learned representations. Single-interaction updaters address this issue by manually
encoding the feature interactions between zu and nu . Multi-interaction updaters further improve
it by considering different binary operators at a time. In general, more attention is needed on the

, Vol. 1, No. 1, Article . Publication date: April 2020.


Deep Learning on Knowledge Graph for Recommender System: A Survey 13

multi-interaction updaters, especially for the fusion function д. For example, instead of using a simple
operation like summation, is it possible to learn a fusion function д?

4 SOLUTION TO PRACTICAL RECOMMENDATION ISSUES


In this section, we discuss the solution proposed by the state-of-the-art frameworks to practical
recommendation issues such as scalability, data sparsity, and so on.

4.1 Cold Start


The cold-start problem [1], i.e., how to make proper recommendations for new users or new items,
is a daunting dilemma in practical recommender systems. On one hand, new users and new items
occupy a large portion in many real-world applications such as YouTube [3]. On the other hand,
the performance of recommendation largely depends on sufficient amount of historical user-item
interaction data, and degrades significantly on new users/items.

4.1.1 Uniform Term Embedding. MEIRec [4] proposed by Fan et al. is an intent recommendation
framework that recommends the most relevant intent (i.e., query) to a user based on his/her click
history. The authors find that queries and titles of items are constituted by terms, and the number
of terms are not many. So they propose to represent the queries/titles with a small number of term
embeddings. It significantly reduces the number of training parameters in the model. The new
queries/items that have never been searched/added before can also be represented by these terms.

4.1.2 Masked Embedding Training with Encoder-Decoder Architecture. STAR-GCN [45]


adopts a multi-block graph encoder-decoder architecture. Each block contains two components: a
graph encoder and a graph decoder. The graph encoder generates node representations by encoding
semantic graph structure and input content features, and the decoder aims to recover the input node
embeddings. To train STAR-GCN, the authors mask some percentage of the input nodes at random
and then reconstruct the clean node embeddings utilizing their context information. This is referred as
masked embedding training mechanism. By using this mechanism, STAR-GCN can learn embeddings
for nodes that are not observed in the training phase. In a cold start scenario, STAR-GCN initializes
the embeddings of new nodes to be zero and gradually refines the estimated embeddings by multiple
blocks of GNN encoder-decoders.

4.2 Scalability
In most existing graph neural networks, their aggregators have to visit the full neighborhood of a
node to produce its node embedding, which is computationally intractable for large-scale knowledge
graphs. In the following, we show some solutions proposed by the studied methods to this scalability
issue.

4.2.1 Important Node Sampling. Instead of examining k-hop graph neighborhood to compute
node embeddings, PinSage [42] defines importance-based neighborhoods, where the neighborhood
of a node u is defined as the T nodes that exert the most influence on node u. Specifically, it simulates
random walks starting from u and computes the L 1 -normalized visit count for nodes visited by
random walks. The neighborhood of u is hence the top T nodes with the highest normalized visit
counts.

4.2.2 Meta-path Defined Receptive Field. MEIRec [4] and RecoGCN [41] propose to leverage
the semantic-aware meta paths to carve out concise and relevant receptive fields for each node, which
is referred as meta-path defined receptive field.

, Vol. 1, No. 1, Article . Publication date: April 2020.


14 Yang Gao, Yi-Fan Li, Yu Lin, Hang Gao, and Latifur Khan

Fig. 6. An example of meta-path defined receptive field.

Definition 4.1. Meta-path [41]. A meta-path ρ is defined as a path in a knowledge graph in the
r1 r2 rl
form of t 1 −→ t 2 −→ · · · −→ tl +1 , where there is a composite relation R = r 1 ◦ r 2 ◦ · · · ◦ rl between
node type t 1 and tl +1 .
Definition 4.2. Meta-path defined Receptive Field (MRF) [41]. Given a knowledge graph G =
ρ
(V, E), for a node v and
ρ ρ ρ  a meta-path ρ of length l, a meta-path defined receptive field Fv =
fv (0), fv (1), · · · , fv (l) is defined as the set of nodes that can be travelled to or passed by from
ρ
node v via the meta-path ρ, where fv (k) denotes the set of nodes reached by k jumps on ρ.
Figure 6 shows an example of MRF. Compared to k-hop graph neighborhood, MRF focuses only
on the relations selected based on prior knowledge, and accelerates the training by greatly reducing
the number of nodes in the computation graph. Moreover, it is possible to enlarge the receptive field
of a node u by computing multiple embeddings along different meta-paths and fusing them together
to obtain a final representation:
zunew = д(zu,
new new new
ρ 1 , zu, ρ 2 , . . . , zu, ρ K ) (34)
where д is a fusion function. Note that the computation of these embeddings are independent from
each other, and hence can be done in parallel.
4.2.3 Vector-wise Aggregator and Updater. To remove the limitation of training on clustered
mini-graphs for large-scale graphs, IntentGC [46] introduces a special graph neural network, which
replaces the normal aggregator and updater in GNNs with a vector-wise aggregator and a vector-wise
updater. The key idea is to avoid unnecessary feature interactions by replacing the expensive matrix
multiplication between a huge weight matrix and a giant feature vector (formed by concatenation
of many vectors) with a summation of multiple small matrix products. The detailed formulation is
shown in Table 3.

4.3 Personalization
4.3.1 Graph Translation. KGNN-LS [30] performs personalized recommendation by converting
the knowledge graph into a user-specific weighted graph, on which a graph neural network is applied

, Vol. 1, No. 1, Article . Publication date: April 2020.


Deep Learning on Knowledge Graph for Recommender System: A Survey 15

Table 4. Summary of datasets under different scenarios

# Relation
Scenario Dataset # Entities # Connections Citation
Types
Amazon-books 95,594 847,733 39 [28], [46]
Book
Book-crossing 25,787 60,787 18 [30], [31]
Citation S2 - - - [40]
Douban 46,423 331,315 5 [23], [45]
Movie Flixter 1,049,000 26,700,000 - [45]
MovieLens 102,569 499,474 32 [17], [25], [30], [45]
Music Last-FM 9,336 15,518 60 [30], [33]
Delicious 5,932 15,328 - [37]
POI Yelp 159,426 6,818,026 6 [23], [25]
Dianping 28,115 160,519 7 [30]
Social Network Epinions 175,000 508,000 - [6], [36]

to compute personalized item embeddings. The weights of edges on this new graph are computed by
a trainable function f (ze ) that identifies important knowledge graph relationships for a given user.
We refer this technique as graph translation.

4.4 Dynamicity
Knowledge graph could evolve over time dynamically, i.e, its nodes/edges may appear or disappear.
For example, in the scenario of social recommendation, a user’s friend list may change from time
to time. When a user adds a number of new friends with similar interests, the recommender system
should update its recommendation strategy accordingly and reflect this change in its results.
To address this issue, Wu et al. [23] consider a dynamic feature graph setting. Specifically, for
each user, they construct a graph where each node represents either this user or one of his/her friends.
If user u has |N (u)| friends, then the total number of nodes in this graph is |N (u) + 1|. The node
feature of friends in this graph is kept unchanged but that of user u is updated whenever u consumes
a new item. Moreover, to capture the context-dependent social influence, the authors propose a graph
attention neural network, which utilizes an attention mechanism to guide the influence propogation
in its aggregator. Each friend of user u is assigned with an attention weight which measures its level
of influence.

5 DATASETS, CODES AND EVALUATIONS


5.1 Datasets
We summarize the state-of-the-art papers and list their publicly available datasets in Table 4 alpha-
betically. Generally speaking, we fit those datasets into 6 major scenarios: book, citation, movie,
music, point of interest (POI) and social network. Here the general goal in each dataset is to let the
recommender system infer users’ preferred items based on users’ past history, for example, watched
movies or read books. The size and complexity of the datasets are upon the problem settings in
different papers.

, Vol. 1, No. 1, Article . Publication date: April 2020.


16 Yang Gao, Yi-Fan Li, Yu Lin, Hang Gao, and Latifur Khan

Table 5. Summary of different evaluation metrics

Evaluation Metrics Formula Papers


Metrics (@K)
TP
Precision T P +F P [17], [25], [36], [40]
TP
Recall T P +F N [15], [23], [30], [31], [33], [40]
Precision·Recall
F1 Score 2 · Precision+Recall [29], [31]
1 Í |Q | 1
MRR |Q | i=1 r ank i [25], [41], [42], [46]
NDCG = iDCG
DCG
Ín 2reli −1
NDCG DCG = i=1 log (i+1) [15], [23], [33], [40], [41]
Í |REL |2 2reli −1
iDCG = i=1 p log
2 (i+1)
q Í
1 n
RMSE n i=1 (ŷi − yi )2 [6], [36], [45],
Í |U |
HR i=1 I () [41], [42]
Ín
1 2
MAE n i=1 |ŷi − yi | [6], [36]
AP = m1 kK=1 Precision@k
Í
· rel(k)
mAP Í |U | [17]
MAP = |U1 | i=1 AP
AUC [5], [29], [30], [31], [36], [46]

6 FUTURE DIRECTIONS
Although existing works have established a solid foundation for GNN-based Knowledge-Aware
Deep Recommender (GNN-KADR) systems, it is still a young and promising research field. In this
section, we suggest several prospective future research directions.

6.1 Scalability Trade-Off


The scalability of existing GNN-KADR systems is gained at the price of sacrificing knowledge graph
completeness. By sampling a fixed number of neighboring nodes, a node may lose its influential
neighbors. By defining receptive field with meta-paths, the knowledge graph perceived by the model
may be deprived of some important semantic information. Moreover, selecting and weighting meta-
paths requires extensive prior knowledge and may be hard for some real-world applications. How to
trade-off the scalability and the knowledge graph completeness is an interesting research direction.

6.2 Dynamicity
Knowledge graphs are in nature dynamic in a way that nodes and edges may appear or disappear,
and that node/edge inputs may change time by time [38]. Moreover, the types of relations between
nodes in a KG may also change along time in real-world scenarios. For example, in agent-initialized
social e-commerce, a user may become a selling agent to his friends someday in the future. Although
DGRec proposed by Wu et al. [23] has partially addressed the dynamicity of graph, few of the
GNN-KADR systems consider accommodating their frameworks to cases where the knowledge
graph contains dynamic spatial and semantic relations. New aggregators and updaters are needed to
adapt to the dynamicity of knowledge graphs. According to our literature survey, we argue that this
is still an open area for future research.

, Vol. 1, No. 1, Article . Publication date: April 2020.


Deep Learning on Knowledge Graph for Recommender System: A Survey 17

6.3 Explainability of Recommendation


Compared to traditional content or collaborate-filtering based recommender systems, explainability
is particular important for GNN-KADR systems, because non-expert humans cannot intuitively
determine the relevant context within a knowledge graph, for example, when identifying influential
users in social network that are good candidates for selling agents in social e-commerce. In addition,
Making explainable predictions to users allow them to understand the factors behind the network’s
recommendations (i.e., why was this item/services recommended? [19, 39]), and is helpful to earn
user’s trust on the system. It also helps the practitioner prob weights and activations to understand
more about the model [26].
There are several existing works focusing on the explainability of GNNs. GNNEXPLAINER [43]
proposed by Ying et al. is a model agnostic approach that provides interpretable explanations
for predictions of any GNN-based model. Given an instance, it identifies a compact subgraph
structure and a small subset of node features that have a crucial role in GNNâĂŹs prediction. Pope
et al. [18] extend three common explainability methods, i.e., gradient-based saliency maps [21],
Class Activation Mapping (CAM) [44], and Excitation Backpropagation (EB) [44], from CNNs to
GNNs to identify important aspects of the computation. However, these methods are designed for
homogeneous graphs and do not take the heterogeneity of knowledge graph into account.
On the other hand, some other methods, e.g., KPRN [35], EIUM [10], RuleRec [16], attempt to
utilize the knowledge graph as an information source to make explainable recommendations. However,
these methods usually sample paths from the knowledge graph and extract information from them
via non-GNN algorithms such as RNN. Thus, compared to GNNs, the topological and semantic
structure of knowledge graphs is corrupted in these cases, leading to unsatisfied performance.
To our best knowledge, how to build an explainable GNN-based knowledge-aware deep recom-
mender system is still an open problem and is unexplored in current literature. We believe this is the
next frontier.

6.4 Fairness of Recommendation


Nowadays, recommender systems are used in a variety of domains affecting people’s lives. This has
raised concerns about possible biases and discriminations that such systems might exacerbate. A
typical example is news recommendation, where articles with biased stand may affect people’s vote
decisions. There are two kinds of biases inherent in recommender systems [7]: observation bias and
bias stemming from imbalanced data. Observation bias exists due to a feedback loop which causes
the model only learn to predict recommendations similar to previous ones. In contrast, imbalance in
data is caused when a systematic bias is present due to society, historical or other ambient biases.
This bias is implicit in data so that recommender systems are usually unaware of them.
Various methods have been proposed to build fair recommender systems. Alex et al. [2] introduce
a pairwise recommendation fairness metric to evaluate algorithmic fairness concern in recommender
systems, and offer a novel regularizer to encourage improve this metric during model training. Sahin
et al. [9] propose complementary measures to quantify bias with respect to protected attributes and
present an algorithm for computing fairness-aware re-ranking of results. Specifically, their algorithm
seek to achieve a desired distribution of top ranked results with respect to one or more protected
attributes. Jurek et al. [14] propose to quantify the user unfairness or discrimination caused by the
post-processing module of recommender systems, which have the original goal of improving diversity
in recommendation.
However, these algorithms are mainly designed for traditional recommender systems and cannot
be used for GNN-KADR systems directly, since they do not consider any information or bias

, Vol. 1, No. 1, Article . Publication date: April 2020.


18 Yang Gao, Yi-Fan Li, Yu Lin, Hang Gao, and Latifur Khan

introduced by the knowledge graph. Therefore, building a fair GNN-KADR system is a promising
but largely-unexplored area where more studies are expected.

6.5 Cross-Domain Recommendation


Besides mining a single knowledge graph, there is an increasing need of computing recommendations
using data from multiple sources. For instance, a customer may be users of more than one social
networks at the same time, e.g., Facebook and LinkedIn. Each of these social networks collect data
regarding this customer and embed them into their own knowledge graphs. Thus, it is reasonable to
leverage information from all these knowledge graphs to boost the recommendation performance.
However, there are two significant gaps in the current research. The first gap is that existing
research lacks exploration on effectively integrating information from multiple knowledge graph
sources [12]. Most of them attempt to build associations between two different knowledge graphs by
connecting related entities (i.e., nodes), as in Wang et al. [34]. The group knowledge represented by
a collection of nodes or edges of the same types, which may be critical to align multiple knowledge
graphs, is ignored in these methods.
Secondly, most existing work provide cross-domain recommendations based on traditional tech-
niques such as collaborative filtering (CF). Some other researches use spectral clustering algorithms
to aggregate knowledge from graphs of different domains, but make strong assumptions that all
graphs should be available simultaneously [8]. Limited effort have been made to utilize GNNs for
cross-domain recommendations. We believe this is a promising research field, since GNNs perform
better than spectral clustering algorithms [13].

7 CONCLUSION
In this survey, we conduct a comprehensive overview of GNN-based Knowledge Aware Deep
Recommender (GNN-KADR) systems. We provide a taxonomy for their core component, i.e., the
graph embedding module, which is usually a GNN in the state-of-the-art frameworks. According
to the taxonomy, we categorize the aggregators of these GNNs into three groups: relation-unaware
aggregator, relation-aware subgraph aggregator and relation-aware attentive aggregator. We also
divide their updaters into three categories: context-only updater, single-interaction updater, and
multi-interaction updater. We provide a thorough review, comparisons and summarizations of these
systems within or between categories. Then, we discuss the solution proposed by these frameworks
to the common practical recommendation issues such as cold-start, scalability and so on. Datasets,
open-source codes and benchmarks for GNN-KADR systems are also summarized. Finally, we
suggest future research directions in this rapidly growing field.

REFERENCES
[1] Ayse Merve Acilar and Ahmet Arslan. 2009. A collaborative filtering method based on artificial immune network.
Expert Syst. Appl. 36, 4 (2009), 8324–8332. https://doi.org/10.1016/j.eswa.2008.10.029
[2] Alex Beutel, Jilin Chen, Tulsee Doshi, Hai Qian, Li Wei, Yi Wu, Lukasz Heldt, Zhe Zhao, Lichan Hong, Ed H. Chi, and
Cristos Goodrow. 2019. Fairness in Recommendation Ranking through Pairwise Comparisons. In Proceedings of the
25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK,
USA, August 4-8, 2019. 2212–2220. https://doi.org/10.1145/3292500.3330745
[3] Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. In
Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, September 15-19, 2016.
191–198. https://doi.org/10.1145/2959100.2959190
[4] Shaohua Fan, Junxiong Zhu, Xiaotian Han, Chuan Shi, Linmei Hu, Biyu Ma, and Yongliang Li. 2019. Metapath-
guided Heterogeneous Graph Neural Network for Intent Recommendation. In Proceedings of the 25th ACM SIGKDD
International Conference on Knowledge Discovery & Data Mining. 2478–2486.
[5] Shaohua Fan, Junxiong Zhu, Xiaotian Han, Chuan Shi, Linmei Hu, Biyu Ma, and Yongliang Li. 2019. Metapath-
guided Heterogeneous Graph Neural Network for Intent Recommendation. In Proceedings of the 25th ACM SIGKDD

, Vol. 1, No. 1, Article . Publication date: April 2020.


Deep Learning on Knowledge Graph for Recommender System: A Survey 19

International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019.
2478–2486. https://doi.org/10.1145/3292500.3330673
[6] Wenqi Fan, Yao Ma, Qing Li, Yuan He, Eric Zhao, Jiliang Tang, and Dawei Yin. 2019. Graph neural networks for social
recommendation. In The World Wide Web Conference. 417–426.
[7] Golnoosh Farnadi, Pigi Kouki, Spencer K. Thompson, Sriram Srinivasan, and Lise Getoor. 2018. A Fairness-aware
Hybrid Recommender System. CoRR abs/1809.09030 (2018). arXiv:1809.09030 http://arxiv.org/abs/1809.09030
[8] Aleksandr Farseev, Ivan Samborskii, Andrey Filchenkov, and Tat-Seng Chua. 2017. Cross-domain recommendation via
clustering on multi-layer graphs. In Proceedings of the 40th International ACM SIGIR Conference on Research and
Development in Information Retrieval. 195–204.
[9] Sahin Cem Geyik, Stuart Ambler, and Krishnaram Kenthapadi. 2019. Fairness-Aware Ranking in Search & Recommen-
dation Systems with Application to LinkedIn Talent Search. In Proceedings of the 25th ACM SIGKDD International
Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019. 2221–2231.
https://doi.org/10.1145/3292500.3330691
[10] Xiaowen Huang, Quan Fang, Shengsheng Qian, Jitao Sang, Yan Li, and Changsheng Xu. 2019. Explainable Interaction-
driven User Modeling over Knowledge Graph for Sequential Recommendation. In Proceedings of the 27th ACM
International Conference on Multimedia, MM 2019, Nice, France, October 21-25, 2019. 548–556. https://doi.org/10.
1145/3343031.3350893
[11] Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. 2010. Recommender systems: an
introduction. Cambridge University Press.
[12] Zhuoren Jiang, Yue Yin, Liangcai Gao, Yao Lu, and Xiaozhong Liu. 2018. Cross-language citation recommendation
via hierarchical representation learning on heterogeneous graph. In The 41st International ACM SIGIR Conference on
Research & Development in Information Retrieval. 635–644.
[13] Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv
preprint arXiv:1609.02907 (2016).
[14] Jurek Leonhardt, Avishek Anand, and Megha Khosla. 2018. User Fairness in Recommender Systems. In Companion of
the The Web Conference 2018 on The Web Conference 2018, WWW 2018, Lyon , France, April 23-27, 2018. 101–102.
https://doi.org/10.1145/3184558.3186949
[15] Mengmeng Li, Tian Gan, Meng Liu, Zhiyong Cheng, Jianhua Yin, and Liqiang Nie. 2019. Long-tail Hashtag Rec-
ommendation for Micro-videos with Graph Convolutional Network. In Proceedings of the 28th ACM International
Conference on Information and Knowledge Management. 509–518.
[16] Weizhi Ma, Min Zhang, Yue Cao, Woojeong Jin, Chenyang Wang, Yiqun Liu, Shaoping Ma, and Xiang Ren. 2019.
Jointly Learning Explainable Rules for Recommendation with Knowledge Graph. In The World Wide Web Conference,
WWW 2019, San Francisco, CA, USA, May 13-17, 2019. 1210–1221. https://doi.org/10.1145/3308558.3313607
[17] Enrico Palumbo, Giuseppe Rizzo, and Raphaël Troncy. 2017. Entity2rec: Learning user-item relatedness from knowledge
graphs for top-n item recommendation. In Proceedings of the eleventh ACM conference on recommender systems. 32–36.
[18] Phillip E. Pope, Soheil Kolouri, Mohammad Rostami, Charles E. Martin, and Heiko Hoffmann. 2019. Explainability
Methods for Graph Convolutional Neural Networks. In IEEE Conference on Computer Vision and Pattern Recognition,
CVPR 2019, Long Beach, CA, USA, June 16-20, 2019. 10772–10781. https://doi.org/10.1109/CVPR.2019.01103
[19] Sungyong Seo, Jing Huang, Hao Yang, and Yan Liu. 2017. Interpretable Convolutional Neural Networks with Dual Local
and Global Attention for Review Rating Prediction. In Proceedings of the Eleventh ACM Conference on Recommender
Systems, RecSys 2017, Como, Italy, August 27-31, 2017. 297–305. https://doi.org/10.1145/3109859.3109890
[20] Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, and Philip S. Yu. 2017. A Survey of Heterogeneous Information
Network Analysis. IEEE Trans. Knowl. Data Eng. 29, 1 (2017), 17–37. https://doi.org/10.1109/TKDE.2016.2598561
[21] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Deep Inside Convolutional Networks: Visualising
Image Classification Models and Saliency Maps. In 2nd International Conference on Learning Representations, ICLR
2014, Banff, AB, Canada, April 14-16, 2014, Workshop Track Proceedings. http://arxiv.org/abs/1312.6034
[22] Joan E. Solsman. 2018. Ever get caught in an unexpected hourlong YouTube binge? Thank YouTube AI for that.
https://www.cnet.com/news/youtube-ces-2018-neal-mohan/
[23] Weiping Song, Zhiping Xiao, Yifan Wang, Laurent Charlin, Ming Zhang, and Jian Tang. 2019. Session-Based
Social Recommendation via Dynamic Graph Attention Networks. In Proceedings of the Twelfth ACM International
Conference on Web Search and Data Mining, WSDM 2019, Melbourne, VIC, Australia, February 11-15, 2019. 555–563.
https://doi.org/10.1145/3289600.3290989
[24] Zhu Sun, Jie Yang, Jie Zhang, Alessandro Bozzon, Long-Kai Huang, and Chi Xu. 2018. Recurrent knowledge graph
embedding for effective recommendation. In Proceedings of the 12th ACM Conference on Recommender Systems,
RecSys 2018, Vancouver, BC, Canada, October 2-7, 2018. 297–305. https://doi.org/10.1145/3240323.3240361
[25] Zhu Sun, Jie Yang, Jie Zhang, Alessandro Bozzon, Long-Kai Huang, and Chi Xu. 2018. Recurrent knowledge graph
embedding for effective recommendation. In Proceedings of the 12th ACM Conference on Recommender Systems.

, Vol. 1, No. 1, Article . Publication date: April 2020.


20 Yang Gao, Yi-Fan Li, Yu Lin, Hang Gao, and Latifur Khan

297–305.
[26] Yi Tay, Luu Anh Tuan, and Siu Cheung Hui. 2018. Latent Relational Metric Learning via Memory-based Attention for
Collaborative Ranking. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, WWW 2018, Lyon,
France, April 23-27, 2018. 729–739. https://doi.org/10.1145/3178876.3186154
[27] Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph
Attention Networks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada,
April 30 - May 3, 2018, Conference Track Proceedings. https://openreview.net/forum?id=rJXMpikCZ
[28] Chengwei Wang, Tengfei Zhou, Chen Chen, Tianlei Hu, and Gang Chen. 2019. CAMO: A Collaborative Ranking
Method for Content Based Recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33.
5224–5231.
[29] Hongwei Wang, Fuzheng Zhang, Xing Xie, and Minyi Guo. 2018. DKN: Deep knowledge-aware network for news
recommendation. In Proceedings of the 2018 world wide web conference. 1835–1844.
[30] Hongwei Wang, Fuzheng Zhang, Mengdi Zhang, Jure Leskovec, Miao Zhao, Wenjie Li, and Zhongyuan Wang. 2019.
Knowledge-aware graph neural networks with label smoothness regularization for recommender systems. In Proceedings
of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 968–977.
[31] Hongwei Wang, Miao Zhao, Xing Xie, Wenjie Li, and Minyi Guo. 2019. Knowledge Graph Convolutional Networks for
Recommender Systems. In The World Wide Web Conference, WWW 2019, San Francisco, CA, USA, May 13-17, 2019.
3307–3313. https://doi.org/10.1145/3308558.3313417
[32] Xiang Wang, Xiangnan He, Yixin Cao, Meng Liu, and Tat-Seng Chua. 2019. KGAT: Knowledge Graph Attention
Network for Recommendation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge
Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019. 950–958. https://doi.org/10.1145/
3292500.3330989
[33] Xiang Wang, Xiangnan He, Yixin Cao, Meng Liu, and Tat-Seng Chua. 2019. Kgat: Knowledge graph attention network
for recommendation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &
Data Mining. 950–958.
[34] Xiang Wang, Xiangnan He, Liqiang Nie, and Tat-Seng Chua. 2017. Item silk road: Recommending items from
information domains to social users. In Proceedings of the 40th International ACM SIGIR conference on Research and
Development in Information Retrieval. 185–194.
[35] Xiang Wang, Dingxian Wang, Canran Xu, Xiangnan He, Yixin Cao, and Tat-Seng Chua. 2019. Explainable Reasoning
over Knowledge Graphs for Recommendation. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI
2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI
Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 -
February 1, 2019. 5329–5336. https://doi.org/10.1609/aaai.v33i01.33015329
[36] Qitian Wu, Hengrui Zhang, Xiaofeng Gao, Peng He, Paul Weng, Han Gao, and Guihai Chen. 2019. Dual graph attention
networks for deep latent representation of multifaceted social effects in recommender systems. In The World Wide Web
Conference. 2091–2102.
[37] Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tieniu Tan. 2019. Session-based recommendation
with graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 346–353.
[38] Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. 2019. A Comprehensive
Survey on Graph Neural Networks. CoRR abs/1901.00596 (2019). arXiv:1901.00596 http://arxiv.org/abs/1901.00596
[39] Jun Xiao, Hao Ye, Xiangnan He, Hanwang Zhang, Fei Wu, and Tat-Seng Chua. 2017. Attentional Factorization
Machines: Learning the Weight of Feature Interactions via Attention Networks. In Proceedings of the Twenty-Sixth
International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017.
3119–3125. https://doi.org/10.24963/ijcai.2017/435
[40] Chenyan Xiong, Russell Power, and Jamie Callan. 2017. Explicit semantic ranking for academic search via knowledge
graph embedding. In Proceedings of the 26th international conference on world wide web. 1271–1279.
[41] Fengli Xu, Jianxun Lian, Zhenyu Han, Yong Li, Yujian Xu, and Xing Xie. 2019. Relation-Aware Graph Convolutional
Networks for Agent-Initiated Social E-Commerce Recommendation. In Proceedings of the 28th ACM International
Conference on Information and Knowledge Management. 529–538.
[42] Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph con-
volutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD International
Conference on Knowledge Discovery & Data Mining. 974–983.
[43] Zhitao Ying, Dylan Bourgeois, Jiaxuan You, Marinka Zitnik, and Jure Leskovec. 2019. GNNExplainer: Generating
Explanations for Graph Neural Networks. In Advances in Neural Information Processing Systems 32: Annual Conference
on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada. 9240–
9251. http://papers.nips.cc/paper/9123-gnnexplainer-generating-explanations-for-graph-neural-networks

, Vol. 1, No. 1, Article . Publication date: April 2020.


Deep Learning on Knowledge Graph for Recommender System: A Survey 21

[44] Jianming Zhang, Zhe L. Lin, Jonathan Brandt, Xiaohui Shen, and Stan Sclaroff. 2016. Top-Down Neural Attention
by Excitation Backprop. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands,
October 11-14, 2016, Proceedings, Part IV. 543–559. https://doi.org/10.1007/978-3-319-46493-0_33
[45] Jiani Zhang, Xingjian Shi, Shenglin Zhao, and Irwin King. 2019. STAR-GCN: Stacked and Reconstructed Graph
Convolutional Networks for Recommender Systems. In Proceedings of the Twenty-Eighth International Joint Conference
on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019. 4264–4270. https://doi.org/10.24963/ijcai.
2019/592
[46] Jun Zhao, Zhou Zhou, Ziyu Guan, Wei Zhao, Wei Ning, Guang Qiu, and Xiaofei He. 2019. IntentGC: a Scalable Graph
Convolution Framework Fusing Heterogeneous Information for Recommendation. In Proceedings of the 25th ACM
SIGKDD International Conference on Knowledge Discovery & Data Mining. 2347–2357.

, Vol. 1, No. 1, Article . Publication date: April 2020.

You might also like