[go: up one dir, main page]

0% found this document useful (0 votes)
250 views106 pages

Final Project Report

This document appears to be a dissertation submitted by three students - Varnika Srivastava, Vinod Ladda, and Vibush Shanmugam - to PES University in partial fulfillment of the requirements for a Bachelor of Technology degree in Computer Science and Engineering. The dissertation examines the impact of a user's relationships on their interests through analyzing both explicit friendship networks and implicit networks based on user similarity in the Yelp dataset. Through modeling the problem as a link prediction task between users and businesses, the dissertation aims to determine whether explicit or implicit networks are more accurate at predicting these links and therefore a user's interests.

Uploaded by

Arnav Dubey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
250 views106 pages

Final Project Report

This document appears to be a dissertation submitted by three students - Varnika Srivastava, Vinod Ladda, and Vibush Shanmugam - to PES University in partial fulfillment of the requirements for a Bachelor of Technology degree in Computer Science and Engineering. The dissertation examines the impact of a user's relationships on their interests through analyzing both explicit friendship networks and implicit networks based on user similarity in the Yelp dataset. Through modeling the problem as a link prediction task between users and businesses, the dissertation aims to determine whether explicit or implicit networks are more accurate at predicting these links and therefore a user's interests.

Uploaded by

Arnav Dubey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 106

Dissertation on

“Impact of User’s Relationship on User


Interest”

Submitted in partial fulfilment of the requirements for the award of degree


of

Bachelor of Technology in
Computer Science & Engineering

Submitted by:

Varnika Srivastava Vedanti Under the guidanc


Vinod Ladda Vibush Karthee
Shanmugam Internal Guide Pro
01FB15ECS336 Bhaskarjyoti Das
01FB15ECS345 Visiting Professor P
01FB15ECS348 University

January – May 2019

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


FACULTY OF ENGINEERING PES UNIVERSITY (Established under
Karnataka Act No. 16 of 2013) 100ft Ring Road, Bengaluru – 560 085, Karnataka, India
PES UNIVERSITY

(Established under Karnataka Act No. 16 of 2013) 100ft


Ring Road, Bengaluru – 560 085, Karnataka, India
FACULTY OF ENGINEERING

CERTIFICATE
This is to certify that the dissertation
entitled

‘Impact of User’s Relationship on User


Interest’

is a bonafide work carried out by

Varnika Srivastava Vedanti 01FB15ECS336


Vinod Ladda Vibush Karthee 01FB15ECS345
Shanmugam 01FB15ECS348

In partial fulfilment for the completion of eighth semester project work in the Program of Study

Bachelor of Technology in Computer Science and Engineering under rules and regulations of PES

University, Bengaluru during the period Jan. 2019 – May. 2019. It is certified that all corrections /

suggestions indicated for internal assessment have been incorporated in the report. The
dissertation
th
has been approved as it satisfies the 8 semester academic requirements in respect of project
work.

Signature Prof. Faculty


Bhaskarjyoti Das
Associate Professor
Signature Dr.
External Viva
Shylaja S S
Chairperson
Signature Dr. B K Name of the Examiners
Keshavan Dean of
Faculty
Signature Dr. B K 1. __________________________
Keshavan Dean of
2. __________________________ __________________________
Signature with Date

__________________________
DECLARATION

We hereby declare that the project entitled “Impact of User’s


Relationship on

User Interest” has been carried out by us under the guidance of


Professor

Bhaskarjyoti Das, Adjunct Faculty and submitted in partial fulfillment of


the

course requirements for the award of degree of Bachelor of Technology


in

Computer Science and Engineering of PES University, Bengaluru


during the

academic semester January – May 2019. The matter embodied in this


report has

not been submitted to any other university or institution for the award of any

degree.

01FB15ECS336 Varnika Srivastava <Signature>

01FB15ECS345 Vedanti Ladda <Signature>

01FB15ECS348 Vibush Shanmugam <Signature>


ACKNOWLEDGEMENT

We would like to thank our guide Prof. Bhaskarjyoti Das for his guidance, support
and

patience throughout our final year project, without which our project would be

incomplete.

We appreciate the efforts made by the project co-ordinators at PES University –


Prof.

Preet Kanwal and Prof. Sangeetha V, for efficiently managing the project logistics
for all

students of Computer Science


Department

We are extremely grateful to our Chairperson, Computer Science and


Engineering,

Dr.Shylaja S Sharath, for her valuable support throughout her time as the
Chairperson.

We would like to thank Dr. B K Keshavan, for managing his responsibilities as a


Dean of

Faculty extremely well.

Our Vice Chancellor, Dr. K N Balasubramanya Murthy, must be credited for not
only

managing the day-to-day operations of the institution, but also constantly


encouraging

students to do better.

We are grateful to our CEO and pro chancellor, Dr. Jawahar Doreswamy, for
envisioning

the idea of giving students an opportunity to gain practical exposure through


internships

in addition to traditional learning, and giving them an all-round experience while

studying at this esteemed


institution

Our acknowledgement would be incomplete without thanking our Chairman and

Chancellor, Dr. M R Doreswamy, for founding this prestigious


institution.

Last, but not the least, we thank our parents for their gratuitous
support.
ABSTRACT

A user’s direct or hidden relationship to another user can have impact on his interests. A
direct

relationship would mean a network of friends where users exercise their own choice,
and

hidden relationship would mean a network of similar users based on certain user
attributes.

In this light, we study various social networks that can impact user interest. In an explicit

network of friends, the links are likely to get stale with time as users may not interact
frequently

with all their friends. Hence, an implicit network based on user similarity, should provide
more

relevant information for predicting user


interest.

In case of the Yelp dataset, the interest is the business. The task of finding which
business

interests a user can be modelled as a link prediction problem between user and
business.
User relationship from social networks, can be captured using node embedding and this

information can be used to perform link prediction. Explicit networks and implicit
networks

were compared for their accuracy at this link prediction


task.

It was observed that implicit networks outperformed the explicit friends network, for
similar

settings, for predicting links between user and


business.
TABLE OF CONTENTS

Introduction ...............................................................................................................1
1.1 Overview .......................................................................................................................................1

1.2 Scope ............................................................................................................................................2

1.3 Objective .......................................................................................................................................4

1.4 Outcomes .....................................................................................................................................5

Research Background ..............................................................................................6


2.1 Literature Survey ........................................................................................................................6 2.1.1
Analyzing Implicit Social Networks in Multiplayer Online Games [7] ..........................................6

2.1.2 Suggesting (More) Friends Using the Implicit Social Graph [2] ....................................................7
2.1.3 Identifying Implicit Relationships between social media users to support social commerce [8] ........8
Implementation ...................................................................................................................................9
Results ............................................................................................................................................ 10

2.1.4 Measuring Tie Strength in Implicit Social Networks [9] ............................................................ 10

2.1.5 Learning to Recommend with Explicit and Implicit Social Relations [3] .................................... 12
2.1.6 Search Lens [10] .................................................................................................................... 13

2.1.7 Graph Embeddings for Social Network Analysis: State of the Art [11] ....................................... 14

2.1.8 Representation Learning and Graph Classification [12] ............................................................. 14

2.1.9 DeepInf: Modeling Influence Locality in Large Social Networks [13] ........................................ 15
2.1.10 Attributed Social Network Embedding [14] ........................................................................... 15

2.1.11 Network Representation Learning with Rich Text Information[15] .......................................... 15

2.1.12 Heterogeneous Network Embedding via Deep Architectures [16] ........................................... 16


2.1.13 Node2Vec: Scalable Feature Learning for Networks [4].......................................................... 16
2.1.14 Link Prediction in Heterogeneous Social Networks [19] ......................................................... 18

2.1.15 A Link Prediction Approach to Recommendations in Large-Scale User-Generated Content Systems


[20] ................................................................................................................................................. 18

Methodology............................................................................................................ 20
3.1 Proposed Approach ................................................................................................................. 20

3.2 Improvement over Alternative Approaches .......................................................................... 20

3.3 High Level System Architecture ............................................................................................. 22

Environment Requirements ................................................................................... 25


4.1 Hardware Requirements .......................................................................................................... 25

4.2 Software Requirements ........................................................................................................... 25

4.3 Data Requirements .................................................................................................................. 27

Demonstration of Outcome .................................................................................... 31


Proposed Approach ................................................................................................ 33
6.1 Creating User Explicit network of friends ............................................................................. 33

6.2 Creating User Business bipartite network ............................................................................ 33

6.3 Creating Implicit Network Type 1 ........................................................................................... 33

6.4 Creating Implicit Network Type 2 ........................................................................................... 34

6.5 Node Embedding ..................................................................................................................... 34

6.6 The Link Prediction Problem ................................................................................................... 35

6.7 Demo Approach ........................................................................................................................ 36

Detailed Design ....................................................................................................... 39


7.1 Creating the networks ............................................................................................................. 39

7.2 Performing Node2Vec ............................................................................................................. 42

7.3 Link prediction ......................................................................................................................... 44

Results ..................................................................................................................... 47
8.1 Visualizations of networks...................................................................................................... 47 8.1.1
User explicit network of friends .............................................................................................. 47

8.1.2 User implicit network type 1 ................................................................................................... 49

8.1.3 User implicit network type 2 ................................................................................................... 50 8.1.4


Comparison of the three networks ........................................................................................... 51

8.2 Link prediction results ............................................................................................................ 52

Conclusions ............................................................................................................ 55

Future Work ............................................................................................................. 56

References .............................................................................................................. 57

Appendices ............................................................................................................. 59
12.1 Implicit Network Type 2 after randomly sampling the edges to reduce the size ............. 59

12.2 Visualization code ................................................................................................................. 60

12.3 Network creation code .......................................................................................................... 62

12.4 Node embedding code .......................................................................................................... 66

12.5 Link prediction code ............................................................................................................. 66


List of Tables

Table No. Title Page

Number

2.1.4.1 Axioms satisfied by various similarity measures 11

2.1.13.1 Area Under Curve Scores for Link Prediction using the
Hadamard operator
18

6.5.1 Node2Vec Parameters for the four networks 34

8.1.4.1 Comparison Table for the user explicit and user implicit
parameters
52
networks in terms of network

8.2.1 Model Accuracy and Recall, Precision F1-Score for class 1 53


8.2.2 Model Accuracy and Recall, Precision F1-Score for class 0 53
List of Figures No.
Title Page

Number
Figure

2.1.4.1 Measuring Kendall-Tau Rank correlations on rankings produced

by different pairs of tie-strength


measures

2.1.13.1 Node2Vec Algorithm

3.2.1 Module Dependencies

3.2.2 High level architecture diagram

3.4.1 A sample user JSON

3.4.2 A sample business JSON

3.4.3 A sample review JSON

5.1 Use case diagram of the user interface

6.7.1 A sample view of the UI for recommendations page

7.1.1 A snapshot of the user feature vectors file

7.1.2 High Level Flowchart for network creation

7.2.1 High Level Flowchart for Network Embedding using Node2Vec

7.3.1 High Level Flowchart for Link Prediction

8.1.1.1 Degree histogram of the user explicit network

8.1.1.2 Logarithmic plot for degree and count for explicit network

8.1.1.3 Visual for the entire user explicit network

8.1.2.1 Degree histogram of the user implicit network type 1


8.1.2.2 Logarithmic plot for degree and count for implicit network type 1

8.1.2.3 Visual for the entire user implicit network type 1

8.1.3.1 Degree histogram of the user implicit network type 2

8.1.3.2 Logarithmic plot for degree and count for implicit network type 2

8.1.3.3 Visual for the entire user implicit network type 2


12.1.1 Degree histogram of the subgraph of user implicit network type 2

after random sampling of edges from


original

12.1.2 Logarithmic plot for degree and count for subgraph of implicit

network type 2

12.1.3 Visual for the subgraph of user implicit network type 2

12.2.1 Reading graph from an edgelist file

12.2.2 Computing number of nodes and edges

12.2.3 Computing network density

12.2.4 Degree distribution histogram plot

12.2.5 Logarithmic plot of degree and count

12.2.6 Computing Global Clustering Co-efficient

12.3.1 Creating the user explicit network of friends

12.3.2 Creating the user business review network

12.3.3 Creating the implicit network type 1

12.3.4 Creating the implicit network type 2

12.4.1 Performing Node Embedding using Node2vec

12.5.1 Creating the dataset by adding embedding vector


12.5.2 Training SVM Model

12.5.3 Testing the SVM model and computing accuracy

12.5.4 Computing Recall, Precision, F1-score for each class k

12.6.1 UI Snapshot 1

12.6.2 UI Snapshot 2
Definitions, Acronyms and
Abbreviations

1. Implicit networks – networks that are created by defining relationships based on


some

inherent property of the system

2. Explicit networks – networks that are created by defining relationships that are

explicitly created in the system.

3. Bipartite networks – the network can be divided into two sets of nodes where an
edge

exists only between nodes that belong to different


sets

4. Edges/links – connection between two nodes indicating some relationship


between the

two nodes

5. Graph/Network embedding - Network embedding is an important method to learn


low-

dimensional representations of vertices in networks, aiming to capture and


preserve the

network structure. Note that a network is represented as a graph (hence graph

embedding) and hence the terms network embedding and graph embedding are
used

interchangeably
6. Node2Vec - an algorithmic framework for learning continuous feature
representations

for nodes in networks. In node2vec, nodes are mapped to a low-dimensional


space of

features that maximizes the likelihood of preserving network neighborhoods of


nodes.

7. LINE - Large-scale Information Network Embedding A network embedding


algorithm

suitable for arbitrary types of information networks: undirected, directed, and/or

weighted and preserves both the local and global network


structures.

8. DeepWalk – A network embedding algorithm that uses deep learning to learn


node

representations. And is considered to be a special case of the Node2Vec


algorithm

9. API – Application Programming Interface

10. Links or edges – a connection between two nodes


(entities)

11. Accuracy – fraction of total number of instances, that the model predicted
correctly

12. Precision – fraction of positive instances(predicted by the model) that the model

predicted correctly

13. Recall – fraction of actual positive instances that the model predicted
correctly.

14. F1-score – harmonic mean of precision and


recall

15. JSON – JavaScript Object Notation

16. DeepInf-GAT - Graph Attention Network


17. DeepInf-GCN - Graph Convolutional
Network
18. TADW - Text Associated Deep Walk

19. Jaccard similarity – measures the number of common neighbours normalized by


the

total number of neighbors between a node


pair

20. Cosine Similarity – measure the cosine angle between two non-zero
vectors

21. Hadamard product – a matrix operation wherein two matrices A and B with the
same

dimensions are multiplied element-wise to create a new matrix C


where

Ci = Ai ∗ Bi

22. Common neighbor – number of directly connected vertices that two nodes have
in

common

23. Network density – fraction of the number of the maximum possible edges in a
graph

that actually exist in the graph

24. Global clustering co-efficient – an indicator of the tendency of nodes to form


clusters,

on a global scale

25. Node degree – the number of connections to the other nodes (in an undirected
graph)

Impact Of User’s Relationship on User’s Interest

CHAPTER 1

Introduction

1.1 Overview
In the present-day context, a large amount of user-based content is being
generated,

posing multiple challenges such as personalizing user content, recommending


relevant

products, etc. While there are multiple ways to solve these issues, one possible
way is

to use the available social networks to extract user relationships and predict user

interests based on these


relationships.

In this light, we have explored various types of social networks such as explicit

networks, implicit networks and bipartite networks, in terms of extracting user

relationships from these networks via node embedding and then using this
information

to predict user interest.

We consider product based social networks where users are allowed to review
products.

Yelp and Amazon are two examples of product based networks, each willing to

recommend an appropriate product to its users. Since Yelp also allows users to

explicitly form friends, review businesses such as restaurants, and provides a

convenient API to download the required data, it was ideal for creating multiple
types

of networks and studying their impact on user


interest.

Many recommendation systems rely on modeling user behaviour by providing

collaborative filtering [1] from user-item rating matrix alone and some social

recommenders take into consideration the user’s explicit network of


friends.

The problem with using only ratings is that specific attributes of a business, which
are

liked or disliked by a user, are not captured adequately. For example, a user
rated a

restaurant business as 2 because he dislikes the cuisine, and this rating might
deter

another user who loves the cuisine, from visiting the


business.

Department of CSE Jan-May 2019 1 | Page

Impact Of User’s Relationship on User’s Interest

In case of considering only an explicit network, the links of friendships get stale
over

time as users, especially those having a large group of friends, will not interact
equally

with all the friends. Moreover, some users choose not to have friends and simply
use

the web to review products. For such users, an explicit network of friends cannot
be

created.

What we propose is that modeling user behaviour using an implicit network


based on

user similarity to other users, carries more relevant information which may be
used to

improve recommendations. Multiple studies [2] [3] show that implicit networks, or
a

combination of implicit and explicit networks, are better at modeling user


interests.

Recommending an item such as a restaurant, music, etc, to a user can be


modeled as
predicting a link between that item and user. Note that this link does not exist at
present,

but is likely to form in the future. The link prediction task is essentially a machine

learning prediction problem, and has been modelled as binary classification in


our

research.

In order to perform the link prediction, any network information has to be


converted to

a suitable representation in order for the machine learning models to function


well. To

achieve this, we use the network embedding technique to convert the network

(represented as a graph) into a vector format. This vector can be used as input to
the

machine learning models for link


prediction.

1.2 Scope

The project was split into four high level


modules:

1. Network Creation

2. Visualizations of networks (network analysis)

3. Network embedding

4. Link prediction

Three types of networks were


created:

• User explicit network of friends

• User - Business bipartite network


Department of CSE Jan-May 2019 2 | Page

Impact Of User’s Relationship on User’s Interest

• User implicit network

Two types of implicit networks created


were:

• Based on users that visit the exact same businesses (referred to as implicit

network Type 1)

• Based on users that visit similar businesses (referred to as implicit network


Type

2)

• User similarity thresholding model was used to create an edge between a


given

pair of users

Multiple models were constructed using the following six combinations of the
above

networks:

1. User explicit network (baseline model)

2. Implicit Type 1 network

3. Implicit Type 2 network

4. Combination of user explicit network and user-business bipartite


network

5. Combination of implicit Type 1 network and user-business bipartite


network

6. Combination of implicit Type 2 network and user-business bipartite


network

For graph embedding, we will be using state of the art tool - Node2Vec [4]
among the
multiple other graph embedding techniques such as LINE [5] and DeepWalk
[6].

In order to demonstrate the differences in the models built, a simplistic UI


that

captures a user’s interest in a set of businesses and friends was


created.

Limitations of the Project

• We will not be focusing on extensive UI for the demo. Minimal functionality

will be presented, sufficient to compare the output of the six


models.

• We don’t aim to build a social recommendation system, but rather focus on


the

network embedding and link prediction


problems.

• While considering businesses, only restaurants were used and user


similarity

was based on cuisine preference extracted from the restaurants they


visited

Department of CSE Jan-May 2019 3 | Page

Impact Of User’s Relationship on User’s Interest

• For link prediction, machine learning models were limited to Support Vector

Machines

• In case of network embedding, owing to memory constraints (16GB RAM),


the

edgelist file sizes were limited to 6MB to avoid out-of-memory errors and

system crash.

• Running memory-intensive code on cloud instances were considered, but it


proved to be time-consuming to set up and expensive in the long-run and
hence

we were forced to reduce dataset size for them to run on normal


laptops.

1.3 Objective

Our project investigates the following:

1. How well implicit networks can capture user behavior compared to user
explicit

networks.

2. If a combination of the user business bipartite network with the other


networks

improves link prediction performance (predicting user


interest)

Having prepared the required networks, graph embedding was performed using

Node2Vec

The Yelp dataset was used to perform the research. The Yelp dataset contains
users,

explicit friends and reviewing businesses (businesses such as restaurants,


barber shops,

automotive shops, etc). The Yelp platform allows a user to review such a
business, and

connect with other users.

Performance of the six models mentioned in 1.2, at the link prediction task can
be
measured using:

1. Test accuracy

2. Precision

3. Recall

4. F1-Score
Department of CSE Jan-May 2019 4 | Page

Impact Of User’s Relationship on User’s Interest

1.4 Outcomes

Network analysis will be done for user explicit network and the two kinds of user

implicit networks

The results of six models, will be presented in a tabular


column.

We expect the implicit network based models to exceed the baseline model in

performance.

To make it more appealing to an end-user, we propose a simplistic UI that allows


a user

to select friends, click on multiple businesses, say to look at them, and on


registering

this information, we aim to recommend a suitable business by using the six types
of

models.

Note that the UI and functionality is not the main goal of the project, we want to
focus

our efforts and devote most of the time on building the newly proposed
models.
Department of CSE Jan-May 2019 5 | Page

Impact Of User’s Relationship on User’s Interest

CHAPTER 2

Research
Background

Given that our project has components of network creation, node embedding and link

prediction, we review literature that expands our knowledge in these


areas.

We also study literature that is a possible use case of our project using an alternate
approach,

and contrast their approach with


ours.

We will also be looking at papers that come up with an appropriate network embedding
from

the user’s network (explicit and implicit) and use it for the link prediction
task.

2.1 Literature Survey

The following papers present an experiment with implicit networks. They help us
understand

how implicit networks can be created, and how they are useful in solving the
problem.

2.1.1 Analyzing Implicit Social Networks in Multiplayer Online Games


[7]
Introduction

Online multi-player gaming websites need the players to select multiple other
players

to form teams. Selecting the right kind of teammates is crucial for a good
game

experience. For this use case, the paper has explored ways of creating an
implicit

network based

Implementation

For the implicit gaming network, six types of player interactions have been
modelled:

1. Players are present in the same


match

2. Players are present on the same side of a


match.

3. Players are opponents

4. Players who won matches together

5. Players who lost matches together


Department of CSE Jan-May 2019 6 | Page

Impact Of User’s Relationship on User’s Interest

6. Percentages of games overlap

Point 6 can be further divided into percentage matches lost together, percentage

matches won together, etc

The paper proposes an algorithm that takes as input a list of all the players who
have

logged into the game. Using the implicit network, the algorithm then computes
the

group membership for each


player.

Players from the same group are assigned to new matches. If a group is too
large as per

the rules of the game, the group is split into


two.

Conclusion

Using implicit networks for grouping online players for matches provides a better

gaming experience than randomly assigning players to teams while scheduling a


match.

2.1.2 Suggesting (More) Friends Using the Implicit Social Graph


[2]

Introduction

This paper uses the fact that people implicitly form groups just by interacting with

different people, and based on this interaction an implicit network is generated. In


an

implicit group, people do not explicitly specify who should be part of this group.
This

is different from explicit networks where users can specify their group of
friends.

This paper suggests a new algorithm to suggest friends by first generating an


implicit

network based on interactions captured via email (Gmail to be specific). This

interaction based implicit network is expected to capture user affinity towards


other

users, and hence provide a ranking of friends to be suggested (some friends are
ranked
higher meaning that user is more likely to make friends with
them)

Implementation

Creation of implicit network for the Friend Suggest


algorithm

Department of CSE Jan-May 2019 7 | Page

Impact Of User’s Relationship on User’s Interest

The directed network is created by building a user’s egocentric network,


connecting the

user to other users who are either senders of an email to the user, or recipients
of the

email sent by the user.

The edges of the implicit network also have edge weights which are based on the

following three attributes of the interaction:

1. Frequency – Higher the frequency of interaction, higher the priority of the


group

2. Recency – The more recent the interaction, the higher priority it


gets

3. Direction – The interactions initiated by the user is given more priority over

interactions not initiated by him

The paper proposes a metric called the Interaction Rank whose value is decided
by

totaling the number of emails sent or received by a user, to or from an implicit


group.

It also considers this interaction’s recency to set priority for more recent
interactions.
The Friend Suggest algorithm finally uses this weighted, directed implicit network
and

predicts new friends given a set of existing


friends.

The paper suggests an application where the user can select the recipients of the
mail,

and this system can suggest more recipients which haven’t already been
added.

Conclusion

The implicit network was based on certain attributes of user interaction such as

interaction-frequency, initiator of the interaction (resulting in a directed graph)


and the

time of the interaction (how recent was the latest interaction). The Friend
Suggest

algorithm was based on this network wherein given a few initial friends, more
new

possible friends were suggested. While this research was implemented using
Gmail, it

could be extended to any interaction based


network.

2.1.3 Identifying Implicit Relationships between social media users to


support social commerce
[8]
Introduction This paper claims that implicit network is better than explicit
network since users with

similar interests need not interact with each other


directly.
Department of CSE Jan-May 2019 8 | Page

Impact Of User’s Relationship on User’s Interest

In order to form implicit networks, temporal analysis techniques are used, and
this

exceeds in performance compared to models built with explicit


networks.

The similarity to our work is that, for temporal analysis, user features vectors are

formed, representing how active a user is. We create user feature vector based
on

attributes of the business visited by the


user.

However, we do not use temporal analysis but rather use jaccard and cosine
similarity

to compute user similarity. Unlike this research, we also make use of node
embedding

to perform link prediction.

Implementation

Networks were created from


Digg.com

To form an implicit network, each user has a user feature


vector.

User feature vector for a particular user u are created such that an element in the
user

feature vector vi is computed as follows:

N (u) U
vi = i ∑ j Ni(j) where Ni(u) is the number of posts made by user u
on a particular day i.

U is the total number of posts made by all users U on the particular


∑ j Ni(j)
day i

Using the user feature vectors, temporal similarity is calculated between any
given user

pair and a user similarity matrix is formed. The elements of the matrix represent
the

strength of the similarity between two user


pairs.

A thresholding mechanism is used to create edges between users i.e; an edge is


formed

between two users if the similarity exceeds the threshold value. In this manner,
an

implicit network is created.

A Top-K filtering method is also used to create the implicit network. Here, edges
to

other nodes are ranked such that more similar pairs have a higher rank. K is the

percentage of users that must be retrieved from the ranking


list.

Department of CSE Jan-May 2019 9 | Page

Impact Of User’s Relationship on User’s Interest

To create an explicit network which is the baseline model, an edge was made
between

two users who commented on the same story and in another type, edge was
created

between the author of a story and the commentor on the same


story.

Results
The quality of this implicit network is judged using Precision and Recall, and a
table is

presented for different values of thresholding. It also uses the alternative Top-K
filtering

method to create the implicit network.

Precision and Recall is also stated for the two types of explicit networks
created.

This output is compared and the implicit network is seen to have outperformed
the

explicit network.

2.1.4 Measuring Tie Strength in Implicit Social Networks


[9]

Introduction

Calculating tie strength is useful for applications that require ranking edges
between

two nodes. This paper presents a set of axioms that a measure for tie strength
can satisfy.

It compares the various tie strengths such as Common neighbours, Jaccard


similarity,

Adamic-Adar, etc. for the axioms they


satisfy.

The Kendall-Tau Rank Correlation between various tie-strength measures is


presented.

Implementation

The paper mentions the following


axioms:
1. Axiom 1 – Isomorphism : Tie strength between any two given nodes
depends

on the link structure and not on the labels assigned to the


nodes

2. Axiom 2 – In an empty graph, tie strength between any two given nodes in
that

graph will be 0. In a graph where only two nodes are present, with a single
edge

between them, then tie strength will be


1

3. Axiom 3 – Node pairs with more features in common will have higher tie

strength

Department of CSE Jan-May 2019 10 | Page

Impact Of User’s Relationship on User’s Interest

4. Axiom 4 – Intimacy : Node pairs having a smaller total list of features, are
likely

to have higher tie strength

5. Axiom 5 – Popularity : Given a large set of features for a node, more


number of

nodes will be similar to that node

6. Axiom 6 – Conditional Independence of vertices : The tie strength of a


particular

node to other vertices does not depend on features that the node does not
have;

it only depends on features that the node


has

7. Axiom 7 – Conditional Independence of Events : Increase in the strength of


a

tie between two nodes due to a feature, is independent of the effect of


other
features

8. Axiom 8 – Sub-modularity: The marginal increase in tie strength between a


pair

of nodes due to a particular feature Q is at most the tie strength between


the

same node pair if Q was the only feature they


had.

Conclusion

Jaccard similarity satisifies axioms 1-


5.

Table 2.1.4.1 - Axioms satisfied by various similarity measures

The paper presents the following observations about the Kendall-Tau Rank
Correlation:

Common Neighbor and Max show very low values of correlation (-0.2 <= τ <=
0.2)

across the datasets

Department of CSE Jan-May 2019 11 | Page


Impact Of User’s Relationship on User’s Interest

Adamic-Adar, Delta, and Linear measure of tie strength show a strong positive

correlation (τ > 0.6) irrespective of the


dataset.

Jaccard similarity is not mentioned as it does not satisfy all the


axioms.

Figure 2.1.4.1 Measuring Kendall-Tau Rank correlations on rankings produced by different pairs of tie-strength
measures on different datasets (a) Adamic-Adar(AA) Linear(L) and Delta(D) (b) Common Neighbour (CN) and Max
(c) Adamic-Adar(AA) Linear(L) and Common Neighbour (CN) (d) Adamic Adar, Max, Delta and Linear

2.1.5 Learning to Recommend with Explicit and Implicit Social


Relations
[3]
Introduction

In the real world, users usually look up to their trusted friends for
recommendations.

This fact is not captured by traditional recommender systems as they assume the
independence and identical distribution of the
users.

This paper proposes a “factor analysis framework”, combining user preference


and his

friends’ preference. This framework is generic and scalable and can be used in
case

explicit trust network of a user is not


available.

Implementation

Department of CSE Jan-May 2019 12 | Page

Impact Of User’s Relationship on User’s Interest

This method takes care of two scenarios – one where user item matrix and an
explicit

friends network is present, and another where only the user item matrix is given.
In the

second case, an implicit network is created using vector space similarity and
pearson

correlation co-efficient are used on the user item matrix. A maximum ‘K’ number
of

users will be similar to a given user.

Conclusion

A large value of K will introduce some noise in the trust network and a small
value of

K will not suffice to effectively represent trust network. Commonly, sparse data is
an

issue, and by also implementing diffusion of information in a network, the data


sparsity

issue can be overcome.


2.1.6 Search Lens [10]

Introduction

SearchLens is a novel tool, where users can create a collection of “Lenses” that
captures

their interests. The aim is to enable users to use various combinations of lenses
in order

to search for more relevant items in various diverse contexts such as suggesting
places

to visit for a traveler in a new city, suggesting appropriate


products.

A key feature of SearchLens is transparency owing to its ability to generate user

personalized views with visual


explanations.

Implementation

While this research has the potential to be applied to numerous contexts, it has
been

limited to restaurant reviews.

A Lens is defined as a reusable set of weighted keywords that efficiently capture


user

interest, and can be used in different configurations to suit users’


requirements.

The Lens model was implemented on a subset of the Yelp Dataset of 48,000
restaurants

and 2 million reviews.

Upon the creation or enabling of a Lens, a visual representation is shown to the


user,

displaying the quality of match between the Lens criteria and the content. For
example,
displaying the frequency of occurrence of all the keywords in the Lens present in
the

given reviews.

Department of CSE Jan-May 2019 13 | Page

Impact Of User’s Relationship on User’s Interest

Conclusion

SearchLens uses the concept of Lenses to capture user interest. Providing a

visualization saves the user the effort of explicitly specifying keywords to


search.

This is a keyword based extraction model and no creation of networks takes


place. Since

keywords are used, Word2Vec is used to convert them to a meaningful vector. A


model

is trained on this vector, producing similar vectors for semantically similar


words.

2.1.7 Graph Embeddings for Social Network Analysis: State of the Art
[11]

Defines what graph embedding is, and lists how certain types of graph
embedding

methods can be used in machine learning algorithms, such as node


classification.

Explains how neural networks can improve node embedding, and how we can
use tools

created for word embedding for graph embedding: Word2Vec, Continuous Bag of

Words, and Skip Gram.

This work also details major graph embedding algorithms, such as DeepWalk,
LINE,
node2vec, and compares experimental results for all the three algorithms on
three

different datasets.

Finally, the text discusses techniques for bipartite graph embedding, community

structure preservation and dimensionality reduction in


brief.

2.1.8 Representation Learning and Graph Classification


[12]

This paper claims that successful embedding methods for the heterogeneous
should

jointly consider the multiple content types and structural properties of the graphs.

However, different vertex types have different semantics, hence are


incomparable in

general. Therefore, using methods such as DeepWalk or LINE that does not

discriminate among different relation types result in a poor


performance.

This paper also provides a description of Graph Representation Learning for

Heterogeneous network and proposes an approach to tackle this by separating


the

networks of different types and jointly learn a representation for


them.

Department of CSE Jan-May 2019 14 | Page

Impact Of User’s Relationship on User’s Interest

The following papers cover different aspects of graph embedding and different
algorithms with

more detail, as well as experimental


results:
2.1.9 DeepInf: Modeling Influence Locality in Large Social Networks
[13]

Focuses on using Deep Learning for Graph Representation Learning to predict


social

influence i.e.; predict user’s behaviour based on that of his neighbours, taking
into

consideration the vertex level


features.

Two variants are proposed - DeepInf-GCN(Graph Convolutional Network) and

DeepInf-GAT(Graph Attention) ; GCN performs worse than other approaches


(due to

homophily assumption)

Limitation of this approach is that the ego-network features are not used in these
two

models and hence important information might be


missed

2.1.10 Attributed Social Network Embedding [14]


The proposed model in this paper considers network attributes as well as
structure.

The Social Network Embedding framework proposed can learn more informative

representations, achieving substantial gains on the tasks of link prediction and


node

classification. Specifically, SNE significantly outperforms


node2vec.

The final SNE model integrates the models of structures and


attributes.
2.1.11 Network Representation Learning with Rich Text
Information[15]
Focusses on text-associated DeepWalk, a network representation learning
method

based on the state-of-the-art DeepWalk. The paper proves that DeepWalk


algorithm

actually factorizes a matrix M and figure out the closed form of M. Introduces text

features into Netwrok Representation Learning and gets a 5% to 20% advantage


as

compared to other baselines especially when the training ratio is small.

Experimental results on three datasets with different training ratios show the

Department of CSE Jan-May 2019 15 | Page

Impact Of User’s Relationship on User’s Interest

effectiveness and robustness of TADW as compared to other baselines.

2.1.12 Heterogeneous Network Embedding via Deep Architectures


[16]

This paper describes a deep embedding algorithm for networked data, using a
highly

nonlinear multilayered embedding


function.

The Heterogeneous Network Embedding Framework is presented


mathematically; the

embedding process encodes both heterogeneous content and linkage


information to a

multidimensional representation for each


object.

Results for classification, multi modal search and clustering show that the HNE
representation gives the most accurate results, when compared to only content,
links,

etc. being represented in the graph embedding


structure.

2.1.13 Node2Vec: Scalable Feature Learning for Networks


[4]

Introduction

This paper provides us with the Node2Vec algorithm, where nodes are mapped
to

feature vectors while maximizing the probability of keeping the network structure

intact. It has various parameters to control the random walk and therefore works
on

diverse network structures. This flexibility helps in learning more meaningful


vectors.

While proposing a novel method for graph representation learning, node2vec is

compared to existing state-of-the-art techniques on the tasks of multi-class

classification and link prediction.

Implementation

The Node2Vec algorithm is as follows:

Department of CSE Jan-May 2019 16 | Page


Impact Of User’s Relationship on User’s Interest

Figure 2.1.13.1 – The Node2Vec Algorithm

We see parameters like walk length, dimensions of the output vector for the node
or

edge embedding, return parameter that controls the likelihood of re-visiting a


node, and

the in-out parameter, which controls the depth and breadth of the
search.

A high value for the in-out parameter encourages the algorithm to explore the
network

further away from the node in question. A low value encourages forming the

representation of the local network.

Conclusion

Table 2.1.13.1 shows Node2Vec’s highest Area Under Curve score


recorded.

This paper also compares Node2Vec to other embedding algorithms such as


LINE and
DeepWalk for link prediction task, with Node2Vec outperforming each of
those.

DeepWalk algorithm is considered to be a special case of


Node2vec

Given this information, we will be using Node2Vec for the embedding


process

The same authors have a GitHub repository with python 2 implementation of

Node2Vec [17]. However, to maintain uniformity in software versions, we will use


a

python 3 implementation of Node2Vec


[18]

Department of CSE Jan-May 2019 17 | Page

Impact Of User’s Relationship on User’s Interest Facebook Protein-Protein


Algorithm Dataset Interac
arXiv
Department of CSE Jan-May 2019 18 | Page
Spectral Clustering 0.612 0.4920 0.5740

DeepWalk 0.9680 0.7441 0.9340

LINE 0.9490 0.7249 0.8902

Node2Vec 0.9680 0.7719 0.9366

Table 2.1.13.1 Area Under Curve Scores for Link Prediction using the Hadamard operator. Node2Vec has the
highest scores

The following sections provide a summary on literature reviewed for Link prediction

approaches

2.1.14 Link Prediction in Heterogeneous Social Networks


[19]
Link Prediction is more complex in heterogeneous networks where there are
different

types of links. This paper takes a “multi-task metric learning” approach to link

prediction wherein each link type has a particular distance measure computed
based on

network and node features.

This paper describes the limitations of the current unsupervised and supervised

approaches as being unable to pick out relevant features from large noisy
datasets.

While the novel models proposed in this research are robust to noisy data, its

performance on large number of link types is not


known.

2.1.15 A Link Prediction Approach to Recommendations in Large-


Scale
User-Generated Content Systems
[20]
In user-generated content system, the amount of data is very large, and a K-
Nearest

Neighbours algorithm employed on a user item matrix will not scale well. The

experiments conducted in this paper show that considering the immediate

neighbourhood rather than the global network structure is better for performing
link

prediction. Using explicit networks for link prediction only marginally improves

performance.

Impact Of User’s Relationship on User’s Interest

The paper explores six link prediction


methods:

i. Adamic-Adar

ii. Common Neighbour

iii. Global Popularity

iv. Page Rank

v. Katz Measure

vi. Rooted Page Rank

Among all the six Link Prediction algorithms, the node-neighborhood based
algorithms

i.e. common neighbor and Adamic Adar outperformed the others. They also

outperformed the traditional collaborative filtering


method.
Department of CSE Jan-May 2019 19 | Page

Impact Of User’s Relationship on User’s Interest

CHAPTER 3

Methodology

3.1 Proposed Approach

a. We will essentially compare the effectiveness of user explicit network alone


vs.

implicit networks

b. We will also investigate the combination of the explicit and implicit networks

with user-business network


features

c. User being interested in a particular business will be modelled as a link

prediction task

d. Baseline model : with user explicit network embedding


alone

e. Non-baseline models:

▪ Implicit Type 1 network

▪ Implicit Type 2 network

▪ Combination of user explicit network and user-business bipartite

network

▪ Combination of implicit Type 1 network and user-business bipartite

network

▪ Combination of implicit Type 2 network and user-business bipartite

network

3.2 Improvement over Alternative Approaches


1. Recommendation systems rely on modeling user behaviour by providing

collaborative filtering [1] from user-item rating matrix alone and some social

recommenders take into consideration the user’s explicit network of


friends.

2. The problem with using only ratings is that specific attributes of a business
which

are liked or disliked by a user are not captured adequately. For example, a
user rated

a restaurant business as 2 because he dislikes the cuisine, and this rating


might deter

another user who loves the cuisine, from visiting the


business.

3. Current generation social recommenders rely on explicit social networks


but:

Department of CSE Jan-May 2019 20 | Page

Impact Of User’s Relationship on User’s Interest

o Links in explicit social network become stale with time

o Links in an explicit social network do not reflect behavior dimension

o Not every user has an explicit network of friends

4. Ourapproach using implicit networks based on user similarity, has more


relevant

information than explicit network, since users with similar interests are
grouped

together.

5. We will rely on network embedding that tries to translate network neighborhood

into a well-meaning vector


Department of CSE Jan-May 2019 21 | Page

Impact Of User’s Relationship on User’s Interest

3.3 High Level System Architecture

As stated earlier, there are four major modules in this


project:

1. Network Creation

2. Visualization

3. Node Embedding

4. Link Prediction

Figure 3.2.1 – Module dependencies. The downward arrow shows that module at a lower level is dependent
on the

module at the higher level. Modules at the same level are independent of each
other.

Module interactions
1. The module Network Creation is the base module, and all other modules are
dependent

on this modules’ results. Without implementation of this module, the others


cannot be

implemented.

2. The Visualization module uses data from the Network Creation


Module

3. The Visualization and Node Embedding modules are independent of each


other.

Department of CSE Jan-May 2019 22 | Page

Impact Of User’s Relationship on User’s Interest

4. The Link Prediction Module is dependent on the Node Embedding Module’s


results.

Without performing Node Embedding of the four networks, there will be no data

available to the Link prediction module.


Figure 3.2.2 – A high level architecture diagram showing dependencies among the various modules. The
network

creation task has been split into Yelp Dataset (source), User Explicit Network (a sub-module of network
creation

task) User Business Bipartite Network (another sub-module) and User Implicit Network (sub-module of
network

creation). The submodules User Explicit Network and User-Business Bipartite Network are shown on the
same level

implying that their implementation is independent of each


other.

Department of CSE Jan-May 2019 23 | Page

Impact Of User’s Relationship on User’s Interest

1. Yelp Data is present in a json format and this is the input to the
system

2. Next stage is the data processing step where we use this input to create the user-
explicit

network and the user-business bipartite


network

3. Next stage is another data processing stage that uses the output from the
previous two

stages. Here, the user implicit network is formed based on users obtained from
the user

explicit network, and similarity is calculated based on the reviews written by the
user

which is captured in the user-business bipartite


network

4. Next Stage is graph/network (or node) embedding. We apply this process to all
the three

networks that have been formed in the previous step. We will be using Node2Vec
for

this, as it has been shown to outperform( or perform as good as) other


embedding

algorithms

5. Once the networks have been embedded in their respective vectors, we use this

information for the link prediction task using machine learning


models

6. The output will be the comparison of performance among the three models
generated

Department of CSE Jan-May 2019 24 | Page

Impact Of User’s Relationship on User’s Interest

CHAPTER 4

Environment
Requirements

4.1 Hardware Requirements

The large size of our dataset and resulting amount of computation have resulted
in these

requirements for hardware


specifications:

• Current System Usage:

o Memory: 16GB RAM

o Intel i7 6th Gen processor

o No GPU

• Ideal System Usage:

o Memory: 32GB RAM

o Processors, number of CPUs: Intel i7 8th Gen

o GPU: Nvidia Tesla

4.2 Software Requirements

4.2.1 Anaconda

• Freeand open-source distribution of the Python and R programming


languages

for scientific computing and package management and deployment, used


to

handle jupyter notebook

• Version: 5.0.2
4.2.2 Jupyter Notebook

• Jupyter Notebook is a web-based interactive computational environment for

creating IPython documents, using the open-source IPython


library

Department of CSE Jan-May 2019 25 | Page

Impact Of User’s Relationship on User’s Interest

• Version: 5.7.0

4.2.3 Python
• Programming language used for all graph and network
computation

• Version: 3.5

• Python libraries used:

▪ pandas

▪ numpy

▪ pickle

▪ json

▪ networkx

▪ node2vec

▪ sklearn

▪ matplotlib

o Pip3 for installing any packages that are not provided by default in the

Anaconda environment
4.2.4 User Interface and backend
o JavaScript

o HTML5

o CSS3

o backend - Flask

Software such as anaconda, python can be run on multiple operating systems and
hence there

is no OS requirement.

In case of Node2Vec, parallelization of generating walks, is not supported on Windows,


but

this does not impact the accuracy of the code. On Linux based operating systems such
as

Ubuntu, where it is claimed that Node2Vec parallelization works, joblib cannot perform
task-

splitting for a very large dataset


size.

Department of CSE Jan-May 2019 26 | Page

Impact Of User’s Relationship on User’s Interest The


User Interface is supported on all latest
versions of browsers like Chrome, Edge, Mozilla

Firefox.

4.3 Data Requirements

The dataset we will be working on is a bipartite product-based social media


dataset that:

• Offers both explicit user and item attributes

• Offers implicit user attributes ( text data such as reviews)


• Offers both explicit and implicit network

• Facilitates forming suitable ML problem (link prediction) such as visiting a

restaurant

The dataset we have decided to use, is from the ongoing Yelp Dataset Challenge for
Graph

Mining and contains users, explicit friends and reviewing businesses (businesses such
as

restaurants, barber shops, automotive shops, etc). The Yelp platform allows a user to
review

such a business, and connect with other


users.

The Yelp Dataset consists of the following JSON


files:

1. user.json – list of users along with information about friends, user id, review
count,etc

2. business.json – list of business and multiple attributes such as business id,


average

reviews, review count, business


attributes

3. review.json – list of all the reviews posted by users for the


businesses

4. tips.json – comments from users about what’s a businesses’


specialty

Our project uses JSON files 1, 2 and


3.

4.3.1 User.json

a. Original dataset has 1.7 million users

b. Size of this dataset is 2.03 GB


c. The following features are present in the
dataset

i. User ID

ii. Name

iii. Review count


Department of CSE Jan-May 2019 27 | Page

Impact Of User’s Relationship on User’s Interest

iv. Yelping Since (when the user joined Yelp)

v. Friends –list of user ids who are friends with the


user

vi. Fans

vii. Elite – list of years in which a user was elite

viii. Average Stars

ix. Various compliments received from other users such as writer,


photos,

funny, cool

x. Number of useful, funny, cool votes sent by the


user

d. For filtering such a large network, review count above 100 was chosen as
the

criteria

e. For creating the user explicit network, only User ID and Friends features
were

used.
Figure 3.4.1- A sample user JSON from user.json
file

4.3.2 Business.json

f. The original dataset has 188593


businesses

g. Size of the dataset is 139 MB

h. The following features are present in the


dataset

i. Business ID

ii. Business Name

iii. Business Address with Postal Code, City and


State

Department of CSE Jan-May 2019 28 | Page

Impact Of User’s Relationship on User’s Interest

iv. Latitude and Longitude

v. Stars – averaged ratings across all reviews


received

vi. Review count – number of reviews for this


business

vii. Is open (0 – if closed, 1 if open)

viii. Attributes such as Restaurant Offering takeout, Business Parking


types

ix. Categories such as Mexican, Burgers, Italian,


etc

x. Hours – operating timings for restaurant on each day of the


week
i. This data is used to create user feature vectors and the given granularity of
the

data a lot of these attributes can be used to form the user


vector

Figure 3.4.2 – A sample business JSON from business.json


file

4.3.3 Review.json

j. The original data has 5996996(nearly 6 million) reviews

k. Size of the Dataset is 4.39 GB

l. The following features are present in the


dataset

i. Review ID

Department of CSE Jan-May 2019 29 | Page

Impact Of User’s Relationship on User’s Interest


ii. User ID

iii. Business ID

iv. Review Rating

v. Review text

vi. Date of the review

vii. Number of useful votes received

viii. Number of funny votes received

ix. Number of cool votes received

m. For creating the user business network, this review dataset was filtered
using

the user id as we wanted only those users with review count more than
100.

Figure 3.4.3 – A sample review JSON from review.json


file

Yelp Data is available at the Yelp website [21] or can be downloaded from Kaggle
[22].
Department of CSE Jan-May 2019 30 | Page

Impact Of User’s Relationship on User’s Interest

CHAPTER 5

Demonstration of
Outcome

1. Analysis and visualization of the three


networks

a. We will provide information such as clustering coefficient, type of network


for

all the three networks

b. We will visualize the network using networkx graph visualization


methods

2. Tabular column with node2vec parameters used to embed the four


networks

a. Number of walks

b. Length of the walk

c. P

d. Q

e. Dimension of output vector

3. Tabular format for result comparison at link prediction task, reporting the
following:

a. Test accuracy

b. Precision

c. Recall
d. F1-Score

4. User Interface

a. We will create a simplistic UI that allows a user to

i. view all the businesses

ii. click on multiple businesses, say to review them

iii. select friends

iv. see recommended business suggested by the six


models

b. The figure below shows what our UI offers to an end-


user

Department of CSE Jan-May 2019 31 | Page

Impact Of User’s Relationship on User’s Interest


Figure 5.1 – Use case diagram of the User
Interface

Department of CSE Jan-May 2019 32 | Page

Impact Of User’s Relationship on User’s Interest

CHAPTER 6

Proposed
Approach

6.1 Creating User Explicit network of friends

a. Yelp user.json dataset contains 1.5 Million users which is too large to be
processed by

a typical machine (hardware


constraint)

b. Hence the users are first filtered based on the number of reviews they have
published
(considered to be 100 and
above)

c. This is based on the assumption that eliminating users who have published very
few

reviews will not cause any information loss. Essentially, we look at a subset of
the

possible graph containing users that publish many


reviews.

6.2 Creating User Business bipartite network

a. Again, we consider only the those users’ reviews who have published more than
100

reviews (dependency on user explicit network


data)

b. Each review present in the business.json after filtering, is represented as an edge

between that particular user and business


node

6.3 Creating Implicit Network Type 1

The implicit network type 1 is based on similarity of users visiting the same
business

The following algorithm was used to create the


network:

a. For each user pair U1, U2 in filtered


user.json

a. Get set S1 of businesses reviewed by U1 from user business


network

b. Get list S2 of businesses reviewed by U1 from user business


network
c. Compute Jaccard similarity J for (U1, U2) as follows:

i. J = | S1 ∩ S2 | |S1
U S2|

d. If J >= 0.5, create an edge between U1 and


U2

Department of CSE Jan-May 2019 33 | Page

6.4
Impact Of User’s Relationship on User’s Interest

Creating Implicit Network Type 2

The implicit type 2 network is based on similarity of users who visit similar
restaurants

Here, business attributes such as cuisines are considered. For example, users who visit
Italian

restaurants are considered similar


users..

A user feature vector is created which consists of the number of businesses visited by
the user

categorized by cuisines. For example, if a user has visited 3 businesses that serve
Italian cuisine,

2 with American cuisine, 5 with Mexican cuisine, then the user vector would look
like

[3 , 2 , 5]

Note that a business that offers multiple cuisines will be counted multiple times. For
example

a restaurant that offers Italian and American cuisines but not Mexican would have a
vector like:

[1 , 1 , 0]

This business vector is added to the user


vector.
6.5 Node Embedding

Embedding was performed for each of the four networks. The Node2Vec parameters
used are

presented as follows:

• For all four networks in-out parameter was chosen to be


1

• For all four networks the return parameter was chosen to be


1

• For all four networks, the walk length was chosen to be


30

• The differences in parameters are summarized


below

Network Output Dimension Number of walks

User explicit 64 200

User business 32 100

Implicit type 1 64 200

Implicit Type 2 32 100

Table 6.5.1 - Node2Vec Parameters for the four networks

Department of CSE Jan-May 2019 34 | Page

Impact Of User’s Relationship on User’s Interest


6.5.1 Constraints for Embedding

The reason for using fewer walks and a lower output dimension for the user
business

was to not increase dataset dimensionality(more columns) compared to dataset


size(number of rows) when combining it with other network
embedding.

As the implicit type 2 network was much larger than implicit type 1, the dimension
of

embedding vector was kept at 32 in order to reduce RAM consumption when


running

the code for Node2Vec.

While Node2Vec offers a way to load big graphs that may not fit into memory, it
also

slows down the processing of the graph. For example, processing the implicit
type 2

network would take roughly 600 hours when this approach was used. Owing to
very

large processing time, this method was was not


used.

For the user business network and implicit network type 2, which were extremely
large,

the edgelist file (text file containing the list of edges in a network) was restricted
to

6MB in order to run the embedding process with a 15.6GB RAM without running
out

of memory.

The embedding of the nodes in each of these networks were used for link
prediction

task.

6.6 The Link Prediction Problem

a. We will essentially compare the effectiveness of user explicit network alone vs.
implicit

networks

b. We will also investigate the combination of the explicit and implicit networks with
user-

business network features

c. User being interested in a particular business will be modelled as a link prediction


task

d. Baseline model : with user explicit network embedding


alone

e. Non-baseline models:

a. Implicit Type 1 network

b. Implicit Type 2 network

c. Combination of user explicit network and user-business bipartite


network

Department of CSE Jan-May 2019 35 | Page

Impact Of User’s Relationship on User’s Interest

d. Combination of implicit Type 1 network and user-business bipartite


network

e. Combination of implicit Type 2 network and user-business bipartite


network

f. Link prediction task is modelled as a binary classification


problem:

a. Prediction of class 0 implies no edge formed between user and


business

b. Prediction of class 1 implies edge formation between user and


business

g. The presence of an edge between user and business is a positive instance while
the

absence is a negative instance. As both are needed in order to train a model, a


residual
graph of the user-business network was created, and then randomly sampled to
get

required number of edges of the residual graph to roughly balance the number of

positive and negative instances for training and


testing.

h. Train and test data had the following attributes used for
prediction:

a. Embedding for user from explicit network (64 dimension) + business


id

b. Embedding for user from implicit network Type 1 (64 dimension) +


business id

c. Embedding for user from implicit network Type 2 (32 dimension) + business
id

d. Embedding for user from explicit network + Embedding for business from
user

business network (32


dimensions)

e. Embedding for user from implicit network Type 1 + Embedding for business

from user business network

f. Embedding for user from implicit network Type 2 + Embedding for business

from user business network

g. Each instance in the train and test had the node embedding as described
above.

h. All the six combinations were tested for their accuracy on the test
data.

6.7 Demo Approach

Step 1 – Front-end functions:

1. Display a list of users


2. Allow new user to select at least 5 friends

3. Display a list of businesses

4. Allow new user to select at least 5 businesses

5. Once the above conditions are satisfied perfrom steps 6 and 7

Department of CSE Jan-May 2019 36 | Page

Impact Of User’s Relationship on User’s Interest 6. Create a list


of user_id selected by the new user

7. Create a list of business_id selected by the new user

8. Send lists in 6 and 7 to the backend

Step 2- Back end Functions:

1. We have to create 4 networks for the New User

a. User Explicit

b. User Implicit – Type 1

c. User Implicit – Type 2

d. User Business

2. We have to perform embedding for each of the four networks

3. Create Datasets ( 6 different types)

4. Use the saved SVM models to predict ‘Link Column’

5. Whichever Links are predicted as 1, will be recommended to the user

6. Backend will send list of recommended business_id for each of the 6 models

Step 3- Front-end Functions:


1. Create 6 separate div tags for showing the recommended businesses for each
model

2. Display business cards (as done in step 1)

3. Logical view of the recommendations page

Department of CSE Jan-May 2019 37 | Page

Impact Of User’s Relationship on User’s Interest


Department of CSE Jan-May 2019 38 | Page
Figure 6.7.1-A sample view of the UI for recommendations page

Impact Of User’s Relationship on User’s Interest

CHAPTER 7

Detailed Design

7.1 Creating the networks

1. Since the number of users was too large for processing, the users were
filtered

based on the review count (above


100)

2. This filtering resulted in 67000 users

3. After filtering the users, the reviews.json was also


filtered.

4. The reviews written by the same 67000 users were


considered

5. This resulted in a 1.35 GB review file


(filtered_reviews.json)

6. This filtered review file was used to create the user-business


network

7. The business.json file was relatively small and did not require any
filtering

8. For each of the 67000 users, the friends list was extracted and the user
explicit

network was created

9. Using the user-business network, each user pair was taken, and the
jaccard

similarity was computed as follows: J = |A |A


∩ B| ∪ B|

Where A is the set of businesses visited by one user, and B is the set of
businesses

visited by the second user

A ∩ B represents the set of businesses reviewed by


both users

A ∪ B represents the set of all businesses viewed by either one or


both of the users

10. Similarity threshold of 0.5 was used to preserve an edge between two users
i.e. all

user pair with similarity below 0.5 were discarded. This resulted in the
implicit

type 1 network

11. The implicit type 1 network connects users that have reviewed the
same

businesses

12. For the implicit type 2 network, user-business network and the
business.json file

were used to create the user feature vector which was initially
computed

considering the following attributes:


Department of CSE Jan-May 2019 39 | Page

Impact Of User’s Relationship on User’s Interest

i. bar

ii. burgers

iii. sandwiches

iv. thai

v. mexican

vi. italian
vii. american

viii. coffee

ix. tea

x. chinese

xi. asian

xii. barbeque

xiii. french

xiv. indian

xv. pakistani

xvi. middle_eastern

xvii. japanese

xviii. shopping

xix. fashion

xx. flowers

xxi. gifts

xxii. restaurants

xxiii. dessert

xxiv. breakfast

xxv. lunch

xxvi. dinner

xxvii. brunch

xxviii. RestaurantsTakeOut

xxix. BusinessAcceptsCreditCards

xxx. RestaurantsReservations
Department of CSE Jan-May 2019 40 | Page

Impact Of User’s Relationship on User’s Interest


Department of CSE Jan-May 2019 41 | Page
13. Since most of the features used were applicable to restaurant business, we
dropped

features like fashion, flowers, gifts. The restaurants feature was redundant
as we

considered only restaurant


businesses.

14. Upon further inspection, it was observed that attributes like whether a
restaurant

was good for desserts, lunch, breakfast, etc, were extremely sparse as
multiple

restaurants did not have such information. Hence these features (xxiii to
xxvii)

were also dropped.

Figure 7.1.1 A snapshot of the user feature vectors file

15. For each user pair, the cosine similarity was computed as
follows:
U1.U2
cosine(U1,U2) =

√∑ U1i2 ni ∗ √∑ U2i2 ni

where i is each element in the respective vector and n the is vector length.

Note that both vectors must be of the same


size.

16. As there were a very large number of similar users, a similarity threshold of 1
was

used to preserve an edge between a pair of


users.
17. The implicit type 2 network connects users that visit similar types of
businesses

Impact Of User’s Relationship on User’s Interest

Figure 7.1.2 - High Level Flowchart for network creation

7.2 Performing Node2Vec

We initially tried Node2Vec with this data on a 16GB RAM Intel i7 processor, but
the

size proved to be too large.

While Node2Vec allows for parallelization of generating random walks, it does


not

work on Windows OS and hence we had to run it serially, making it a very time

consuming process (over 2 days for


processing)

We tried running Node2Vec on a native Linux (Ubuntu) machine with


parallelization,

but the sub-tasks were still too large to be handled and the process
failed.

Department of CSE Jan-May 2019 42 | Page

Impact Of User’s Relationship on User’s Interest


Department of CSE Jan-May 2019 43 | Page
Node2Vec was run on Google Cloud with a Linux instance, however due to some

issues, the runtime would disconnect from the cloud instance and the process
would

stall.

Since Cloud provided limited credits to run the process for free, and more
powerful

instances were very expensive, this approach of running code was


abandoned

Hence we were forced to randomly remove the edges and use the remaining
edges for

performing node embedding

It was observed that files over 6MB would cause out-of-memory errors when
using

Node2Vec, on the 16GB

Thus, the implicit network type 2 and user business network were filtered by
randomly

sampling the edges.

On performing Node2Vec, embedding for the nodes in the respective networks


were

retrieved.
Figure 7.2.1 High Level Flowchart for Network Embedding using Node2Vec

Impact Of User’s Relationship on User’s Interest

7.3 Link prediction

As stated earlier, 6 models were created and compared for accuracy, precision, recall
and F1-

Score

In each dataset the negative instances (user and business where link does not exist)
were

extracted from the residual graph of the original user-business


network

7.3.1 Model 1: User Explicit Network

The following features were used:

i. Each element of the 64-dimensions embedding vector of a node in the


explicit

network was considered a


feature

ii. Business id.

The Support Vector machine model was trained on a total of 65 features, predicting 0 or
1 class

for the Link

7.3.2 Model 2 : User Implicit Network – Type 1

The following features were used:

i. Each element of the 64-dimensions embedding vector of a node in the


implicit

type 1 network was considered a


feature
ii. Business id.

The Support Vector machine model was trained on a total of 65 features, predicting 0 or
1 class

for the Link

7.3.3 Model 3 : User Implicit Network – Type 2

The following features were used:

i. Each element of the 32-dimensions embedding vector of a node in the


implicit

type 2 network was considered a


feature

ii. Business id.

The Support Vector machine model was trained on a total of 33 features, predicting 0 or
1 class

for the Link

Department of CSE Jan-May 2019 44 | Page

Impact Of User’s Relationship on User’s Interest

7.3.4 Model 4 : User Explicit Network + User Business


Network

The following features were used:

i. Each element of the 64-dimensions embedding vector of a user node in the

explicit network was considered a


feature

ii. Each element of the 32-dimensions embedding vector of a business node in


the

user business network was considered a


feature
The Support Vector machine model was trained on a total of 96 features, predicting 0 or
1 class

for the Link

7.3.5 Model 5 : User Implicit Network – Type 1 + User Business


Network

The following features were used:

i. Each element of the 64-dimensions embedding vector of a user node in the

implicit network – Type 1 was considered a


feature

ii. Each element of the 32-dimensions embedding vector of a business node in


the

user business network was considered a


feature

The Support Vector machine model was trained on a total of 96 features, predicting 0 or
1 class

for the Link

7.3.6 Model 6 : User Implicit Network – Type 1 + User Business


Network

The following features were used:

i. Each element of the 32-dimensions embedding vector of a user node in the

implicit network – Type 1 was considered a


feature

ii. Each element of the 32-dimensions embedding vector of a business node in


the

user business network was considered a


feature
The Support Vector machine model was trained on a total of 64 features, predicting 0 or
1 class

for the Link

Department of CSE Jan-May 2019 45 | Page

Impact Of User’s Relationship on User’s Interest


Department of CSE Jan-May 2019 46 | Page
Figure 7.3.1- High Level Flowchart for Link Prediction

Impact Of User’s Relationship on User’s Interest

CHAPTER 8

Results

8.1 Visualizations of networks

The user explicit network and the user implicit networks (both Type 1 and Type
2) were

visualized in order to compare their network


structure.

To achieve this, we present the visualization of the entire network, the degree

histogram, the Logarithmic graph of degree count. Network parameters such as


the

number of nodes and edges, network density, maximum degree and global
clustering

co-efficients have also been


mentioned.

8.1.1 User explicit network of friends


Figure 8.1.1.1 Degree histogram of the user explicit network

Department of CSE Jan-May 2019 47 | Page

Impact Of User’s Relationship on User’s Interest

Figure 8.1.1.2- Logarithmic plot for degree and count for explicit network
Figure 8.1.1.3- Visual for the entire user explicit network

Department of CSE Jan-May 2019 48 | Page

Impact Of User’s Relationship on User’s Interest 8.1.2

User implicit network type 1

Figure 8.1.2.1- Degree histogram of the user implicit network type 1

The maximum degree here is a little over 200.


Figure 8.1.2.2- Logarithmic plot for degree and count for implicit network type 1

Department of CSE Jan-May 2019 49 | Page

Impact Of User’s Relationship on User’s Interest

Figure 8.1.2.3- Visual for the entire user implicit network type 1

8.1.3 User implicit network type 2


Figure 8.1.3.1- Degree histogram of the user implicit network type 2

Maximum degree here is over 4000.

Department of CSE Jan-May 2019 50 | Page

Impact Of User’s Relationship on User’s Interest

Figure 8.1.3.2 - Logarithmic plot for degree and count for implicit network type 2
Figure 8.1.3.3 Visual for the entire user implicit network type 2

8.1.4 Comparison of the three networks

It is expected to see relatively higher clustering in the implicit networks as users are
grouped

based on similarity.

Network density is poor across all, as only selective edges are


present.

Using Type 2 implicit network, there are over 7 million edges compared to 1 million
edges in

the explicit network and 200,000 edges in implicit Type 1. The high number of edges
can be

attributed to relaxing the similarity constraint – rather than considering exact businesses
visited,

similar businesses are


considered.
Department of CSE Jan-May 2019 51 | Page

Impact Of User’s Relationship on User’s Interest It


is also observed that the Degree-Count
logarithmic plot is not a straight line with positive

slope. This implies that the networks are not real-world


networks.

Parameter User explicit User implicit


User implicit network User implicit

Network of friends network type 1


type 2 network type 1

61725 18231 22668


Number of nodes

1912069 211493 7673278


Number of edges
<1000 ~200 >4000
Maximum Degree

0.00100 0.00127 0.02987


Network Density

Global clustering 0.56169


0.79355
co-efficient

Department of CSE Jan-May 2019 52 | Page


Table 8.1.4.1 – Comparison of Network parameters for the three networks

8.2 Link prediction results


Our best performing model was combination 6 – user implicit network type 2 with
business

node embedding giving us an accuracy of 89% with F1-score for class 1 as 0.901
and F1-

score for class 0 as 0.895.

The Accuracy, Recall, Precision and F1-Scores have been presented for both
classes 0 (no

link) and 1 (link is present)

Impact Of User’s Relationship on User’s Interest (c


(class 1)
Data Combination Accuracy Recall
(class 1)
Precision
(class 1)
F1-Score
F1-Score

User Explicit 54.7% 0.938 0.161 0.275

User Implicit - Type 1 76.6% 0.984 0.541 0.698

User Implicit - Type 2 75.1% 0.553 0.915 0.690

User Explicit with business embedding User Implicit - Type 1 with business
50% Not defined 0 Not
embedding
(model does not predict 1s) 79.1% 0.789 0.996 0.880
defined
Department of CSE Jan-May 2019 53 | Page
User Implicit - Type 2 with business embedding
89.8% 0.924 0.879 0.901

Table 8.2.1- Model Accuracy and Recall, Precision F1-Score for class 1

Data Combination Accuracy Recall (class Precision


Precision

F1-Score (class 0)
(class 0)
(class 0)

User Explicit 54.7% 0.507 0.987 0.670

User Implicit - Type 1 76.6% 0.683 0.991 0.809

User Implicit - Type 2 75.1% 0.948 0.680 0.792

User Explicit with business embedding 50% 0.5 1 0.667

User Implicit - Type 1 with business User Implicit - Type 2 with business

embedding embedding
79.1% 0.884 0.085 0.156 89.8% 0.872 0.920 0.895

Table 8.2.2- Model Accuracy and Recall, Precision F1-Score for class 0

Impact Of User’s Relationship on User’s Interest It


is also likely that dimensionality plays a role
here. User explicit network with the business

network embedding and User Implicit - Type 1 with business embedding had 96
(64+32)

features with which to perform link prediction. Given the relatively smaller size of
the

datasets, and high dimensionality, the accuracy might be reduced. User Implicit -
Type 2
network with business embedding had 64 (32+32) features with which to perform
link

prediction. Since the number of features here is lesser, there is a tendency towards
better

accuracy.

Except for the user explicit network, adding the business node embedding from the
user-

business bipartite graph has impacted accuracy in a positive manner. This shows
that user-

business network combined with implicit network has a very good


performance.
Department of CSE Jan-May 2019 54 | Page

Impact Of User’s Relationship on User’s Interest

CHAPTER 9

Conclusions

Social Networks contain very useful information about users that may be used to
predict their

interests. The ties between two nodes represent a relationship between the two
nodes. This

relationship may be explicit or implicit.

Implicit networks have performed better than explicit network with similar
configurations.

Using the implicit network based on cuisines (Type 2) along with the business
embedding

gives the best results in terms of


accuracy.

Using a combination of networks rather than a single network alone, improves link
prediction

accuracy. This is because of the fact that both node types i.e.; user and business
should be

represented as embedding vectors for the model to perform the link prediction
between the

two.

The dataset dimensionality should be in proportion to the number of instances. Very


high

dimensions may lead to redundancy and may impact the accuracy of the machine
learning
model.

Department of CSE Jan-May 2019 55 | Page

Impact Of User’s Relationship on User’s Interest

CHAPTER 10

Future Work

Creating the user business graph can be more complex than what we have considered,
by using

the ratings as edge weights. Rather than just performing a binary classification for
predicting

links between user and business (0 and 1), we can predict the ratings for that link on a
scale of

1-5.

Rather than considering only cuisines from the set of business attributes, for creating
implicit

network, additional information can be gathered by performing text mining on the


reviews

written by users. This can be used to personalize recommendations for the user, by
extracting

what the user particularly liked and disliked about the


business.

While the current work limited to restaurants, other types of businesses can be
recommended

based on business specific attributes. As an example financial services such as banks


can be

analyzed based on type of customer service, reputation, ease of getting


loans, etc.

Department of CSE Jan-May 2019 56 | Page

Impact Of User’s Relationship on User’s Interest


CHAPTER 11

References

[1] Empirical Analysis of Predictive Algorithms for Collaborative


Filtering

[2] Suggesting (More) Friends Using the Implicit Social Graph by Maayan Roth,
Tzvika

Barenholz, Assaf Ben-David, David Deutscher, Guy Flysher, Avinatan Hassidim, Ilan
Horn,

Ari Leichtberg, Naty Leiser, Yossi Matias, Ron


Merom

[3] Learning to Recommend with Explicit and Implicit Social Relations by HAO MA, IRWIN
KING,

and MICHAEL R. LYU

[4] Node2Vec: Scalable Feature Learning for Networks by Aditya Grover, Jure
Leskovec

[5] LINE: Large-scale Information Network Embedding by Jian Tang, Meng Qu, Mingzhe
Wang,

Ming Zhang, Jun Yan, Qiaozhu Mei

[6] DeepWalk: online learning of social representations by Bryan Perozzi, Rami Al-Rfou,
Steven

Skiena

[7] Analyzing Implicit Social Networks in Multiplayer Online Games by Alexandru Iosup, Ruud
van

de Bovenkamp, Siqi Shen, Adele Lu Jia, Fernando


Kuipers

[8] Identifying Implicit Relationships between Social Media Users to Support Social
Commerce by

Christopher C. Yang, Xuning Tang, Haodong Yang, Ling


Jiang

[9] Measuring Tie Strength in Implicit Social Networks by Mangesh Gupte, Tina Eliassi-
Rad

[10] Search Lens : composing and capturing complex user interests for exploratory search
by Chee

Chang, Joseph & Hahn, Nathan & Perer, Adam & Kittur,
Aniket.

[11] Graph Embeddings for Social Network Analysis: State of the Art by Paul Compagnon,
Kilian

Ollivier

[12] Graph Representation Learning and Graph Classification by Sara


Riazi

[13] DeepInf: Modeling Influence Locality in Large Social Networks by Jiezhong Qiu, Jian
Tang,

Hao Ma, Yuxiao Dong, Kuansan Wang, and Jie


Tang

[14] Attributed Social Network Embedding by Lizi Liao, Xiangnan He, Hanwang Zhang, and
Tat-

Seng Chua

Department of CSE Jan-May 2019 57 | Page

Impact Of User’s Relationship on User’s Interest

[15] Network Representation Learning with Rich Text Information by Cheng Yang, Zhiyuan Liu,
Deli

Zhao, Maosong Sun, Edward Y.


Chang

[16] Heterogeneous Network Embedding via Deep Architectures by Shiyu Chang, Wei Han,
Jiliang

Tang, Guo-Jun Qi, Charu C. Aggarwal, Thomas S.


Huang
[17] https://github.com/aditya-
grover/node2vec

[18] https://github.com/eliorc/node2vec

[19] Link Prediction in Heterogeneous Social Networks by Sumit Negi, Santanu


Chaudhury

[20] A Link Prediction Approach to Recommendations in Large-Scale User-Generated


Content

Systems by Nitin Chiluka, Nazareno Andrade, and Johan


Pouwelse

[21] https://www.yelp.com/dataset

[22] https://www.kaggle.com/yelp-dataset/yelp-
dataset
Department of CSE Jan-May 2019 58 | Page

Impact Of User’s Relationship on User’s Interest

CHAPTER 12

Appendices

12.1 Implicit Network Type 2 after randomly sampling the


edges

to reduce the size

Number of nodes: 4017

Number of Edges: 258843

Network Density : 0.03209008625691409

Global clustering co-efficient :


0.793551512149485

Figure 12.1.1- Degree histogram of the subgraph of user implicit network type 2 after random sampling of
edges from original

Department of CSE Jan-May 2019 59 | Page

Impact Of User’s Relationship on User’s Interest

Figure 12.1.2 - Logarithmic plot for degree and count for subgraph of implicit network type 2
Figure 12.1.3 Visual for the subgraph of user implicit network type 2

12.2 Visualization code

Reading a graph from an edgelist file

Number of nodes and


edges

Department of CSE Jan-May 2019 60 | Page

Impact Of User’s Relationship on User’s Interest

Network Density

Degree distribution histogram


Degree-count logarithmic plot

Global clustering co-efficient

Department of CSE Jan-May 2019 61 | Page

Impact Of User’s Relationship on User’s Interest


Graph Visualization

12.3 Network creation code


Figure 12.3.1- Creating the user explicit network of friends

Department of CSE Jan-May 2019 62 | Page

Impact Of User’s Relationship on User’s Interest


Department of CSE Jan-May 2019 63 | Page
Figure 12.3.2- Creating the user-business review network

Impact Of User’s Relationship on User’s Interest


Department of CSE Jan-May 2019 64 | Page
Figure 12.3.3- Creating the implicit network type 1

Impact Of User’s Relationship on User’s Interest


Department of CSE Jan-May 2019 65 | Page
Figure 12.3.4- Creating the implicit network type 2

Impact Of User’s Relationship on User’s Interest

12.4 Node embedding code

Figure 12.4.1- Performing Node Embedding using Node2vec

12.5 Link prediction code


Figure 12.5.1- Creating the dataset by adding embedding vector

Department of CSE Jan-May 2019 66 | Page

Impact Of User’s Relationship on User’s Interest


Department of CSE Jan-May 2019 67 | Page
Figure 12.5.2- Training SVM Model

Figure 12.5.3- Testing the SVM model and computing accuracy

Impact Of User’s Relationship on User’s Interest

Figure 12.5.4- Computing Recall, Precision, F1-score for each class k

UI snapshots
Figure 12.6.1 UI Snapshot 1 - Business Cards selected by user, are highlighted in Dark blue. The number in
top right corner indicates number of businesses selected

Department of CSE Jan-May 2019 68 | Page

You might also like